7. Gradient Descent

Convex Function

Function $f: \R^d \rightarrow \R$ is convex if $\forall x_1 \in \R^d, x_2 \R^d, 0 \le t \le 1$

f(tx_1 + (1 - t)x_2) \le tf(x_1) + (1 - t)f(x_2)

\ell(w + s) \approx \ell(w) + g(w)^\intercal s

where

l(w + s) \approx \ell(w) + g(w)^\intercal s + \frac{1}{2} s^\intercal H(w)s

where

Use the first order approximation
Assume that the function $\ell$ around $w$ is linear and behaves like $\ell(w) + g(w)^\intercal s$

s = - \alpha g(w)

\ell(w + (-\alpha g(w))) \approx \ell(w) - \underbrace{\alpha g(w)^\intercal g(w)}_{>0} < \ell(b)

\begin{align*} &\text{Compute the graident: } \nabla \ell_D(w) \\ &\text{Update the vector of parameters: } w \leftarrow w - \alpha \nabla \ell_D(w) \end{align*}

\begin{align*} &\text{Choose (with replacement) a random training example} (x^{(i)}, y^{(i)}) \in \mathcal D \\ &\text{Compute the graident just for them: } \nabla \ell_{(x^{(i)}, y^{(i)})}(w) \\ &\text{Update the vector of parameters: } w \leftarrow w - \alpha \nabla \ell_{(x^{(i)}, y^{(i)})}(w) \end{align*}

Tip

\ell (w + s) \approx \ell(w) + g(w)^\intercal s + \frac{1}{2} s^\intercal H(w)s