Comparison of optimization algorithms

In the next version of scikits.optimization, I’ve added some Quasi-Newton steps. Before this version is released, I thought I would compare several methods of optimizing the Rosenbrock function.

Optimizers

What is great with the Rosenbrock cost function can be summed up in a few points:

  1. It is hard to optimize
  2. Gradient can be easily computed

I’ve decided to compare the number of function and gradient calls as well as the cost behavior for several usual optimization algorithms. So the contestants will be:

  • A Nelder-Mead/Polytope/simplex optimizer
  • SSA, a simplex with simulated annealing (think of amotsa from Numerical Recipes)
  • GA, a genetic algorithm
  • a gradient-based optimizer
  • CG, a conjugate-gradient optimizer
  • BFGS, a quasi-Newton optimizer

The first 4 are from scikits.optimization, the last 2 are based on proprietary code that cannot be published, but it’s interesting to compare with other tools that are used to compare gradient-free complex cost functions.

Results

I’ve made a small slideshow with the derivative-free algorithms. First you have for each of the three algorithms the number of function calls versus iteration, the cost versus iteration and finally the location of testing parameters.

This slideshow requires JavaScript.

This slideshow is for the derivative-based algorithms.

This slideshow requires JavaScript.

Conclusion

I was quite surprised by some algorithm behaviors. Clearly, the conjugate gradient algorithm behaves far better than the simple gradient, but the BFGS followed the Rosenbrock valley far better. A good quasi-Newton can be really efficient (not a brent because it needs to solve a linear equation), although a conjugate gradient can be enough in some cases.

For the gradient-free algorithms, SSA really behaved badly. This is mainly due because the hyperparameters that must be adequately tuned. This function is quite simple, but my first trial at setting these parameters was far more efficient for GA or the simplex than for SSA. So I would go for GA for gradient-free optimization: few and easy hyper parameters and a good browse of the search space.

The code for the 4 first tests and display plots

2 thoughts on “Comparison of optimization algorithms

  1. It would be usrful to habe all axes explicitly labeled.
    Also, for those methods which only use function evaluations, you could add one figure that shows the residual vs. the number of function evaluations. This will help comparing the methods.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.