2.5 Local optimization algorithms

For a given potential function $U(X)$, minimization of energy can be performed in many ways. In this section two general methods are described. Since x is a vector in the multi-dimensional space, the methods would only locate the minimum ‘closest’ to the initial sampling point.

At any given point in configuration space, the internal energy may be expanded as a Taylor series:

  \begin{equation}  \label{U} U(X+\delta {X}) = U(X) + \frac{\partial U}{\partial X} \delta {X} + \frac{1}{2!} \frac{\partial U^2}{\partial X^2} (\delta {X})^2 + ... \end{equation}   (47)

This expansion is usually truncated at either first or second order, since close to the minimum energy configuration we know that system will behave harmonically. If the expansion is truncated at first order then the minimization just needs the energy term and its first derivatives, which can be done by steepest desecents and conjugate gradient methods 60.

Steepest descent

This method assumes the optimum direction to move from the starting $X$ is just the steepest-descent direction g,

  \begin{equation}  \label{gradient1} {\textrm{g}}_ i = \left. - \frac{\partial U}{\partial X} \right|_{X=X_ i} \end{equation}   (48)

It is assumed that g can be otained from the negative of a gradient operator $G$ acting on $X$ so that

  \begin{equation}  \label{gradient2} {\textrm{g}}_ i = -G(X_ i) \end{equation}   (49)

Therefore, the next step is,

  \begin{equation}  \label{gradient3} X_{i+1} = X_ i + b_ i {\textrm{g}}_ i \end{equation}   (50)

The step length $b$ can be determined by

  \begin{equation}  \label{gradient4} {\textrm{g}}_ i \cdot G(X_ i + b_ i {\textrm{g}}_ i) = 0 \end{equation}   (51)

This method is easy to program, but it often converges slowly. Although each iteration of the steepest-decent algorithm moves the trial vector towards the minimum of the function, there is not guarantee that the minimum will be reached in a finite number of iterations. If the initial steepest-descent vector does not lie at the right angles, a large number of steps will be needed.

\includegraphics[scale=0.8]{chapter2/pdf/Fig3.png}
Figure 2.10: Comparison of two line search minimization methods (red: steepest descent; blue: conjugate gradient, modified from Wikipedia).

Conjugate gradients

If the only information one has about the function $U(X)$ is its value and gradient at a set of points, the optimal method would allow one to combine this information so that each minimization step is independent of the previous ones. To accomplish this, one can derive the condition that makes each minimization step independent of another, the direction of each step should be conjugate to each other.

  \begin{equation}  \label{gradient5} {\textrm{d}}_ m \cdot G \cdot {\textrm{d}}_ n = 0 (n \neq m) \end{equation}   (52)

The precise search direction can be obtained from the following algorithm:

  \begin{equation}  \label{gradient6} {\textrm{d}}_ m = {\textrm{g}}_ m + \gamma ^ m {\textrm{d}}_{m-1} \end{equation}   (53)

where

  \begin{equation}  \label{gradient7} \gamma _ m = \frac{{\textrm{g}}_ m \cdot {\textrm{g}}_ m}{{\textrm{g}}_{m-1} \cdot {\textrm{g}}_{m-1}} \end{equation}   (54)

and $\gamma _1$ = 0.

The initial direction is taken to be the negative of the gradient at the starting point. A subsequent conjungate direction is then constructed from a linear combination of the new gradient and the previous direction that minimized $F(X)$. Since the minimization along the conjugate directions are independent, the dimensionality of the vector space explored is reduced by 1 at each iteration.

Above I just introduced two most fundamental optimization algorithms. If we expand the energy to second order, the second derivative matrix (Hessian matrix, $H$) is needed. In this case, several methods, like Newton-Raphson, Davidon-Flecher-Powell and Broyden-Fletcher-Goldfarb-Shanno can work efficiently 61.