Is Learning Feasible?: Generalization Bound

Generalization Bound Idea

A ==Generalization bound== help us to understand how the Generalization Error $E_{out}$ could be, even though we cannot compute it.

In order to understand that, the bound is defined in terms of the training error $E_{in}$, which we can compute.

Starting from ==Hoeffding Inequality==, we can derive two possible escenarios:

  1. ==Maximum Bound==: What is the **maximum** possible value for $E_{out}$
  2. ==Minimum Bound==: What is the **minimum** possible value for $E_{out}$
The first one is of more interest because it could explain how well our $E_{in}$ can estimate $E_{out}$ by providing a small maximum bound. The following image depicts this derivation

test

A Generalization Bound is other way to understand if Learning is feasible or not, we will came up to similar conclusions as with Hoeffding Inequality.

In order to have a better expression of $\epsilon$, we use assume that the right term of Hoeffding Inequality is a term $\delta$ that we, as users, defined beforehand.

test

and we can isolate $\epsilon$ to get $$\epsilon = \sqrt{\frac{1}{2N}\text{ln}\left(\frac{2M}{\delta}\right)}$$ Notice that by replacing $\epsilon$ in any of our two previous cases we have an expression in terms of $N$ (==training set size==) and $M$ (==model complexity==).

Now we, as users, will define $\delta$ instead of $\epsilon$.

In such way, we say that starting from the Hoeffding Inequality with probability ==$1 - \delta$==, the following bounds are true ==Maximum Bound== $$E_{out}(g) \leq E_{in}(g) + \sqrt{\frac{1}{2N}\text{ln}\left(\frac{2M}{\delta}\right)}$$ == Minimal Bound== $$E_{out}(g) \geq E_{in}(g) - \sqrt{\frac{1}{2N}\text{ln}\left(\frac{2M}{\delta}\right)}$$

Notice the isolation $\epsilon$ from $\delta$ is

test

$$\text{ln}\left(\frac{\delta}{2M}\right) = -2N\epsilon^2$$ $$\epsilon^2 = - \frac{1}{2N} \text{ln}\left(\frac{\delta}{2M}\right)$$ $$\epsilon = \sqrt{\frac{1}{2N}ln\left(\frac{2M}{\delta}\right)}$$

Maximum Bound (First Case)

The following is the definition of the maximum bound $$E_{out}(g) \leq E_{in}(g) + \sqrt{\frac{1}{2N}\text{ln}\left(\frac{2M}{\delta}\right)}$$ Notice we can conclude

  1. $\downarrow E_{in} \quad \downarrow E_{out} \Rightarrow$ limits the generalization error by a smaller training error $E_{in}$
  2. $\uparrow N \quad \downarrow E_{out} \Rightarrow$ more training size less generalization error
  3. $\uparrow M \quad \uparrow E_{out} \Rightarrow$ more model complexity more generalization

**__Trade-off M, $E_{in}$, $E_{out}$__**
Additionally, we can note again that there exists a complex relation between $M$, $E_{in}$, and $E_{out}$ similar to what happened with the Uniform Hoeffnig Inequality

By focusing in the complex relation that exists between $M$, $E_{in}$, and $E_{out}$, we can split the learning in two central questions:

test
  1. can we make sure that $E_{out}(g)$ is close enough to $E_{in}(g)$
  2. can we make $E_{in}(g)$ small enough?
test
Then, using the right $M$ (or $\mathcal{H}$) is important.

Notice that although the generalization error increases with the model complexity, it doesn't increase too much due to the square root and logarithm.

--Regularization can account for this trade-off by minimizing $E_{in}$ and at the same time trying to minimize the complexity of the model M--

Minimum Bound (Second Case)

The following is the definition of the minimum bound $$E_{out}(g) \geq E_{in}(g) - \sqrt{\frac{1}{2N}\text{ln}\left(\frac{2M}{\delta}\right)}$$ Notice we can conclude $\uparrow E_{in} \quad \uparrow E_{out}$.

Then, this bound effectively provides a kind of minimum bound that our generalization error can achieve. It can not necesarily be an absolute zero generalization error. It is limited again by the $E_{in}$

Conclusion

The generalization bound is an intrinsic (max and min) limit of

Finite Hypothesis Set Assumption

Notice the whole derivation is possible because we assume a finite set $H$. However, our ML algorithm usually have infinite hypothesis sets. We will use VC dimension to deal with these cases.