Some weeks ago, the first release of the optimization scikit was done. I’d like to expose here the internal structure and the way the implementation was thought.
There a lot of optimization routines that can be found in dozens of languages. The issue I faced when implementing manifold learning algorithms is that I needed something that would allow me to test several different optimizers as well as developing new ones.
I first focused on unconstrained optimization, and more exactly direction and then line search optimization. I was supposed to be able to pick different direction searches, different line searches, and then different criteria. Besides, some of these algorithms may be decorated by others (for instance, a line search may start from different step lengths given by one or several of its decorators). The issue that arises is that each of these small algorithms may need different pieces of information from the others, and also may give different pieces of intel.
Separation principle and object orientation
Usually everyone uses objects in Python, especially for a complex framework. The problem is, as I’ve said, that each object needs information that is created by all the others objects. It is not elegant to browse through each of them so I had to find something else: the separation principle.
This principle says that any state information is saved inside a single structure and all the objects needing information do not have an associated state. This is opposition to the object orientation (with a focus on encapsulation), but in reality, the line is fuzzy. So this is the main idea behind the optimization scikit: separation principle on objects.
Usage and conclusion
I will present a complex use of the scikit in the future, but the idea is simple: create your direction search, create your line search, pick a stopping criterion and use everything with an optimizer. You may reuse each of them with another optimizer later. This way, I’ve designed some complex optimizers that needed finely choosing parts (line searches with some conjugate gradients may benefit from starting point that are different). Also, the cost function is also an object with gradient or even Hessian. This means that numerical differentiation is easilly implemented with a specific class, just as least squares would also be.
Of course, this means that the scikit does not provide the fastest optimization routine. You may create one that is really faster than the scikit, but this means that you already know the precise optizer you need as well as that you may loose a lot of time inside the optimizer and not the cost function. In most cases, this is not the case.
Although this framework was written in Python, it could be ported to several langages with more or less advantages or drawbacks (Java, C++ with heavy usage of templates, …).