Next: Some Comparisons Up: Conjugate Direction Minimization: An Previous: Limitations of Conjugate Gradient

Improvements in the Conjugate Gradient Method

The conjugate gradient method uses the steepest descent method to produce its first shift direction, or ``seed" direction. The rate of convergence of early cycles can be improved if a seed that incorporates as much information as practical about the function is used. We would like a direction which will include compensation for the differences in the eigenvalues of the normal matrix. Because in X-ray crystallography the diagonal terms of the normal matrix dominate, a diagonal approximation to the normal matrix provides a powerful and quick alternative to the steepest descent method of generating shift directions. In this procedure the search direction is calculated by

where is the diagonal approximation to the normal matrix for the parameters of cycle k. For the fastest rate of convergence, this shift vector should be used as a seed for conjugate direction searches. It is not clear, however, how one should calculate of Equation (6).

The refinement problems that we address are the wide range of magnitude of the eigenvalues of the normal matrix and the existence of off-diagonal terms. If we could choose a different set of parameters, for which the normal matrix was simpler, the rate of convergence would improve. Ideally one would choose a system of parameters such that all the eigenvalues were equal and all the off-diagonal elements were zero; then one cycle of steepest descent minimization would suffice.

Let us assume that we have determined a matrix ( ) which will transform the usual crystallographic parameters into such a set of parameters ( ). The transformations between the familiar parameters and the new ones will be

equation65

We can perform Fletcher-Reeves conjugate gradient minimization on the function using this new parameter space. The equations will be as before, Equations (4) thru (6), but with primes added:

Instead of working with the parameters we can substitute back to the original parameters. The resulting equations are

In these equations the shift vectors, , are all premultiplied by . It would be simpler to eliminate this complication by simply defining . The final equations for conjugate direction refinement, derived from recombined parameters, but operating on the ``native" parameters are

At this point the matrix is undefined. The optimal choice for would require that , the normal matrix for the new parameters, be equal to the identity matrix. To calculate the optimal we need both the normal matrix and its inverse; thus, we made no gains in computational efficiency over the full matrix method. However, if we recognize that in crystallography is almost diagonal we can set . Then in the Equations (15), (16) and (17) will be replaced by . Making this substitution we obtain

The seed direction ( when and ) is now the shift calculated from the diagonal approximation to the normal matrix, as we desired. In addition, however, we have an equation for .

In summary, we have a minimization method where the diagonal terms of the normal matrix are explicitly included and the off-diagonal elements are dealt with via a set of conjugate directions.

Agarwal (1978) suggested a similar method; however, his equation for was incorrect. In the present nomenclature, his proposal for was

In conjugate gradient refinement is equal to ratio of the length of the gradient at point k divided by that length at point k-1. Because k should be closer to the minimum than point k-1, should be less than unity. An estimate of Agarwal's can be achieved by examining his for cycle 2, which is

As before, the parameters after cycle 1 should be closer to the minimum than the starting parameters, resulting in . This value results in the undesirable outcome that the previous cycle's direction is considered more important that the direction calculated from the current parameters. This now explains why Agarwal found it necessary to place an empirical upper limit of 0.4 on . The value of calculated with Equation (19) typically falls between 0.5 and 0.9. The empirical value of 0.4 falls closer to the typical value than either setting to zero (and using the method of Equation (7)) or using the equation of Agarwal(1978).

Next: Some Comparisons Up: Conjugate Direction Minimization: An Previous: Limitations of Conjugate Gradient

Dale Edwin Tronrud
Thu Nov 20 10:28:11 PST 1997