Wayback Machine
APR May Jun
Previous capture 29 Next capture
2008 2009 2010
25 captures
6 Feb 07 - 29 May 09
sparklines
Close Help

Contents
ALGORITHMS
Site map
Links
Site and author
News
About the site
Contact
TERMS OF USE

L-BFGS algorithm for multivariate optimization

The Newton method is a classical method. It is studied first in the "Optimization Methods" course. This old method isn't used, but it becomes a basis for a whole family of quasi-Newton methods. One of those methods is the L-BFGS algorithm.

Quasi-Newton methods

The classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form. In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighbourhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing).

Quasi-Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases.

It was stated above that we perform a step using the Newton method. Actually, it is not exactly so - in that way we just define a direction in which the step will be performed. Some modifications of quasi-Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it's enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value.

LBFGS Hessian update scheme

The Hessian of a function isn't always available, more often we can only calculate the function gradient. Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the quasi-Newton step is performed. There is a special formulas which allows to iteratively get a Hessian approximation. On each step approximation, the matrix remains positive definite. The algorithm uses the BFGS update scheme. BFGS stands for Broyden-Fletcher-Goldfarb-Shanno (more precisely, this scheme generates not the Hessian, but its inverse matrix, so we don't have to waste time inverting a Hessian).

The L letter in the scheme name comes from the words "limited memory". In case of big dimensions, the amount of memory required to store a Hessian (N 2) is too big, along with the machine time required to process it. Therefore, instead of using N gradient values to generate a Hessian we can use a smaller number of values, which requires a memory capacity of order of N·M. In practice, M is usually chosen from 3 to 7, in difficult cases it is reasonable to increase this constant to 20. Of course, as a result we'll get not the Hessian but its approximation. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this statement is paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result.

As a matter of fact, initially this method was designed to optimize the functions of a number of arguments (hundreds and thousands), because in this case it is worth having an increasing iteration number due to the lower approximation precision because the overheads become much lower. But we can use these methods for small dimension problems too. The main advantage of the method is scalability, because it provides high performance when solving high dimensionality problems, and it allows to solve small dimension problems too.

Difference scheme and analytical gradient

Do not calculate the function gradient on the basis of a two-point difference formula because it is insufficiently precise. In a number of cases, the algorithm will not be able to work and will return an error message. Use at least a four-point formula div(∂f,∂x) ≈ div(1,12h)(f(x-2h) - 8˙f(x-h) + 8˙f(x+h) - f(x+2h)) or analytical form of the gradient.

Use of Algorithm and Reverse Communication

The optimization algorithm shall obtain values of a function/gradient during its operation. This problem is solved in most program packages by transferring the pointer to the function (C++, Delphi) or delegate (C#) which is used to calculate function value/gradient/Hessian.

The ALGLIB package, differently from other libraries, makes use of reverse communication to solve this problem. When a value/gradient of a function needs to be calculated, the algorithm state is stored within a special structure, control is returned to the calling program, which makes all calculations and recalls the computing subroutine.

Thus, the optimization algorithm is operated in accordance in the following order:

  1. LBFGSState data structure preparation by calling algorithm initialization subrotuine MinLBFGS.
  2. MinLBFGSIteration subroutine call.
  3. If False is returned from the subroutine, then the algorithm operation is completed, and the minimum is found (the minimum itself can be obtained by calling the MinLBFGSResults subroutine).
  4. If True is returned from the subroutine, the latter will make a request for information on the function. The function/gradient shall be calculated (this issue is fully detailed below).
  5. The MinLBFGSIteration subroutine needs to be called again after the requested information is loaded into the LMState structure.

The following fields of the LMState structure are used in order to exchange information with the user:

  • LMState.X[0..N-1] – An array storing information on coordinates of the point x
  • LMState.F – The value of function F(x) should be stored in this field
  • LMState.G[0..N-1] – The gradient grad F(x) should be stored in this field

Subroutines Description

Unit lbfgs


— Data structure LBFGSState

This structure stores current state of optimization algorithm between calls of MinLBFGSIteration subroutine.

— Data structure LBFGSReport

This structure stores optimization report: iterations number IterationsCount, number of function/gradient calculations NFEV, completetion code TerminationType.

— Subroutine MinLBFGS(N, M, X, EpsG, EpsF, EpsX, MaxIts, Flags, out State)

This subroutine is used to start optimization. It is called to initialize State structure before calling MinLBFGSIteration. Subroutine parameters are: task dimension N, model rank M, stopping conditions EpsG, EpsF, EpsX, MaxIts, and Flags parameter (see subroutine comments for more information).

— Subroutine MinLMIteration(var State)

This subroutine is called in the loop until it return False. See 'reverse communication' for more information.

EXAMPLES:    [1]

— Subroutine MinLMResults(State, X, Rep)

This subroutine is used to obtain algorithm results: minimum found and optimization report. It can be called only after MinLMIteration subroutine has returned False.

Source codes

C#

C# 1.0 source.

lbfgs.csharp.zip - L-BFGS algorithm for multivariate optimization

 

C++

C++ source.

lbfgs.cpp.zip - L-BFGS algorithm for multivariate optimization

ablas.zip - optimized basic linear algebra subroutines with SSE2 support (for C++ sources only)

 

C++, multiple precision arithmetic

C++ source. MPFR/GMP is used.

GMP source is available from gmplib.org. MPFR source is available from www.mpfr.org.

lbfgs.mpfr.zip - L-BFGS algorithm for multivariate optimization

mpfr.zip - precompiled Win32 MPFR/GMP binaries

 


Delphi

Delphi source.
Can be compiled under FPC (in Delphi compatibility mode).
lbfgs.delphi.zip - L-BFGS algorithm for multivariate optimization


Visual Basic

Visual Basic source.
lbfgs.vb6.zip - L-BFGS algorithm for multivariate optimization



 
 
Sergey Bochkanov, Vladimir Bystritsky
Copyright © 1999-2009