Line Search Methods¶
As previously stated the calculation of the next step is mathematically written as:
$$x_{k+1} = x_k + \alpha_k p_k$$
where $\alpha_k$ is a positive scalar called the "step length" and $p_K$ is search direction.
Most line search algorithms require $p_K$ to be a descent direction such that $p_k^T \nabla f_k < 0$ because this guarentees some reduction of the objective function.
Step length¶
$\alpha_k$ represents a positive scalar that dictates how far in the direction of the descent to go.
when deciding on the best step length we attempt to find it based on the "Wolfe Condition(s)"
Sufficient Decrease: We decrease the objective function¶
Curvature Condition: The rate of decrease is higher¶
Strong Wolfe Condition: Force $\alpha_k$ to lie in a broad neighborhood of local minimizer or stationary point¶
All wolfe conditions:¶
Proof: there is a value of alpha for a convex objective function that allows the exact minimizer using the gradient descent method:¶
Search Direction¶
Commonly the search direction is defines as $$p_k = -B_K^{-1} \nabla f_k$$ where $B_k$ is either the hessian, or an approximation of the hessian. For simple search directions (like the gradient descent) we represent the Hessian as the identity matrix.