Line Search Methods¶
As previously stated the calculation of the next step is mathematically written as:
$$x_{k+1} = x_k + \alpha_k p_k$$
where $\alpha_k$ is a positive scalar called the "step length" and $p_K$ is search direction.
Most line search algorithms require $p_K$ to be a descent direction such that $p_k^T \nabla f_k < 0$ because this guarentees some reduction of the objective function.
Step length¶
$\alpha_k$ represents a positive scalar that dictates how far in the direction of the descent to go.
![image info](./step_length.png)
when deciding on the best step length we attempt to find it based on the "Wolfe Condition(s)"
Sufficient Decrease: We decrease the objective function¶
![image info](./wolfe_condition_1.png)
Curvature Condition: The rate of decrease is higher¶
![image info](./wolfe_condition_2.png)
Strong Wolfe Condition: Force $\alpha_k$ to lie in a broad neighborhood of local minimizer or stationary point¶
![image info](./wolfe_condition_3.png)
All wolfe conditions:¶
![image info](./all_wolfe.png)
Proof: there is a value of alpha for a convex objective function that allows the exact minimizer using the gradient descent method:¶
![image info](./alpha_gd_minimizer.png)
Search Direction¶
Commonly the search direction is defines as $$p_k = -B_K^{-1} \nabla f_k$$ where $B_k$ is either the hessian, or an approximation of the hessian. For simple search directions (like the gradient descent) we represent the Hessian as the identity matrix.