Without complete knowledge of the normal matrix, minimization of a
quadratic function would require
repeated cycles. In each cycle a shift vector is
chosen ( for cycle k) and the minimum along that
direction is found with a line search. The minimization of a function
along a direction reduces the problem to one parameter, named
.
The new values for the full set of parameters (for cycle k+1) are
The value of is set to that value which minimizes
. It defines the minimum along
the shift vector
.
The particular set of directions searched determines the rate of convergence. For example, if one chooses to search along the axes of parameter space, first varying the x parameter of the first atom, then the y parameter and so on, the minimum can only be found after a number of cycles many times n.
Many cycles
are required when one parameter at a time is varied because the parameters
(and therefore the shift directions) are interdependent;
thus, in subsequent cycles
previously searched directions must be searched again. The
number of cycles could be reduced if a series of directions could be
identified which were independent. This independence is
described mathematically as the direction vectors being conjugate to
the normal matrix (Luenberger, 1973), which is defined explicitly as
when
.
A conjugate direction method is one in which a series of directions are devised which are conjugate with respect to the normal matrix but do not require the normal matrix in order for them to be determined. In the particular conjugate direction method called conjugate gradient, the direction vector for cycle k+1 is determined from
where is chosen to ensure that
is conjugate to
all previous directions.
and
are defined to be
and 0, respectively,
which results in the first cycle being a steepest descent cycle
(
).