It seems that Packt Publishing is on a publishing spree on Machine Learning in Python. After Building Machine Learning Systems In Python for which I was technical reviewer, Packt published Learning Scikit-Learn In Python last November.
It has been a while since my last post on manifold learning, and I still have some things to speak about (unfortunately, it will be the end post of the dimensionality reduction series on my blog, as my current job is not about this anymore). After the multidimensional regression, it is possible to use it to project new samples on the modelized manifold, and to classify data.
Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:
- you can interpolate the data in the original space based on an interpolation in the reduced space, or
- you create an approximation of the mapping with a multidimensional function (B-splines, …)
When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.
The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).
I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well-known techniques.
First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Frobenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.
It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.
Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.
Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.
One of the most cited algorithm in nonlinear manifold learning, with Isomap, is LLE. Contrary to Isomap, LLE tries to retain the local data structure of the sampled manifold. Whereas Isomap preserves absolute distances, LLE preserves local relative distances (it preserves barycenter weights).
This means that LLE is not suitable for every dimensionality reductions. For visualization purposes, it can lead to very different solutions if the manifold is noisy.
Before going into more details about nonlinear manifold learning, I’ll present the linear description that is used in most of the applications.
PCA, for Principal Components Analysis, is the other name for the Karhunen-Loeve transform. It aims at describing the data by a single linear model. The reduced space is the space on the linear model, it is possible to project a new point on the manifold and thus testing the belonging of point to the manifold.
The problem with PCA is that it cannot tackle nonlinear manifold, as the SwissRoll that was presented in my last item.
Isomap is one of the “oldest” tools for dimensionality reduction. It aims at reproducing geodesic distances (geodesic distances are a property of Riemanian manifolds) on the manifold in an Euclidiean space.
To compute the approximated geodesic distances, a graph is created, an edge linking two close points (K-neighboors or Parzen windows can be used to choose the closest points) with its weight being the Euclidean distance between them. Then, a square matrix is computed with the shortest path between two points with a Dijkstra or Floyd-Warshall algorithm. This follows some distance and Riemanian manifolds properties. The number of points is generally chosen based on the estimated distance on the manifold.
Finally, an classical MDS procedure is performed to get a set of coordinates.