It seems that Packt Publishing is on a publishing spree on Machine Learning in Python. After Building Machine Learning Systems In Python for which I was technical reviewer, Packt published Learning Scikit-Learn In Python last November.
I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well-known techniques.
First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Frobenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Before going into more details about nonlinear manifold learning, I’ll present the linear description that is used in most of the applications.
PCA, for Principal Components Analysis, is the other name for the Karhunen-Loeve transform. It aims at describing the data by a single linear model. The reduced space is the space on the linear model, it is possible to project a new point on the manifold and thus testing the belonging of point to the manifold.
The problem with PCA is that it cannot tackle nonlinear manifold, as the SwissRoll that was presented in my last item.
I hope to present here some result in February, but I’ll expose what I’ve implemented so far :
- Laplacian Eigenmaps
- Hessian Eigenmaps
- Diffusion Maps (in fact a variation of Laplacian Eigenmaps)
- Curvilinear Component Analysis (the reduction part)
- NonLinear Mapping (Sammon)
- My own technique (reduction, regression and projection)
- PCA (usual reduction, but robust projection with an a priori term)
The results I will show here are mainly reduction comparison between the techniques, knowing that each technique has a specific field of application : LLE is not made to respect the geodesic distances, Isomap, NLM and my technique are.
As I approach the end of my PhD, I will propose my manifold learning code in a scikit (see this page) in a few weeks. For the moment, I don’t know which scikit will be used, but stay put…
The content of the scikit will be :
- Laplacian eigenmaps
- Diffusion maps