In my last post about optimization, I’ve derived my function analytically. Sometimes, it’s not as easy. Sometimes also, a simple gradient optimization is not enough.
scikits.optimization has a special class for handling numerical differentiation, and several tools for conjugate gradients.
Differentiation is accomplished through helper class. The new function has to be derived from of two classes,
ForwardFiniteDifferences or CenteredFiniteDifferences. In these classes, gradient and Hessian are computed based on the actual cost function. You may also override the gradient so that only the Hessian is numerically computed.
A single parameter, difference, is the delta that will be applied on each parameter to compute the derivatives. This number is arbitrally fixed, and may be changed in numerical differentiation fails.
Here is what a weighted sinc function looks like (the weights are given by self.a):
from scikits.optimization.helpers import ForwardFiniteDifferences class Function(ForwardFiniteDifferences): def __init__(self): ForwardFiniteDifferences.__init__(self) self.a = numpy.array((1, 2)) def __call__(self, x): t = numpy.sqrt(numpy.sum((self.a * x)**2)) return -numpy.sinc(numpy.pi * t)
Conjugate gradient is an optimization technique that uses the new gradient as well as the last descent direction or gradient to create a new descent direction. Besides this, the line search must have some properties. You don’t need an exact one, but it must obey some rules, known as the Wolfe-Powell rules. There are two formulations, both available in the scikit. I suggest you read some of the reference book I give at the end of the topic to choose the right line search, as well as the appropriate conjugate gradient.
Besides the well-known Fletcher Reeves conjugate gradient, there are several other factors that can be used, as the Polak-Ribière-Polyak that can also be combined with Fletcher-Reeves one to form what is perhaps the best formulation. The different available conjugate gradients are available in the documentation.
OK, let’s see what it can do. I’ve assembled a Fletcher-Reeves conjugate gradient with Wolfe-Powell rules line search.
mystep = step.FRConjugateGradientStep() mylinesearch = line_search.WolfePowellRule() mycriterion = criterion.criterion(ftol = 0.0001, iterations_max = 100) myoptimizer = optimizer.StandardOptimizer(function = fun, step = mystep, line_search = mylinesearch, criterion = mycriterion, x0 = numpy.array((.35, .45)))
I didn’t display the result, because in fact it didn’t finished at 100 iterations.
Here are animations on the evolution of the value of the cost function and the position of the parameters.
Knowing that Fletcher-Reeves adds to the new gradient a fraction of the last direction, this circular evolution can be understood. In fact, with a simple gradient, it works better, as well as if a Polak-Ribière-Polyak is used. It all depends on how the different ingredients are mixed, as well as if the Wolfe-Powell rules are close to an exact line search.
I think that trying different solutions is what is fun about optimizations. With conjugate gradient, a lot of different optimizers can be built. With numerical differentitation, you don’t have to rely on analytical gradient. Know though that the more the parameters, the longest it takes to compute the gradient (O(n)) and I don’t even speak about the Hessian (O(n²)).
Here are some useful books that explain the theory and the ideas behind these topics:
Link to the tutorial code: Launchpad.