### Dimensionality reduction: similarities graph and its use

Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.

It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.

### Dimensionality reduction: explicit optimization of a cost function

Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.

Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.

### Some news about the manifold learning scikit

I got the word today that my paper was accepted, so I can now focus on delivering the code.

I’m in the process of refactoring it so that it depends less on some of our libraries here. In two weeks, there is a nipy sprint in Paris I will attend, and machine learning is one of the topic we will discuss, so this may indicate where and how I’ll contribute the code I will keep going on showing some results next week.

### Dimensionality reduction: Locally Linear Embedding

One of the most cited algorithm in nonlinear manifold learning, with Isomap, is LLE. Contrary to Isomap, LLE tries to retain the local data structure of the sampled manifold. Whereas Isomap preserves absolute distances, LLE preserves local relative distances (it preserves barycenter weights).

This means that LLE is not suitable for every dimensionality reductions. For visualization purposes, it can lead to very different solutions if the manifold is noisy.

### Dimensionality reduction: Principal Components Analysis

Before going into more details about nonlinear manifold learning, I’ll present the linear description that is used in most of the applications.

PCA, for Principal Components Analysis, is the other name for the Karhunen-Loeve transform. It aims at describing the data by a single linear model. The reduced space is the space on the linear model, it is possible to project a new point on the manifold and thus testing the belonging of point to the manifold.

The problem with PCA is that it cannot tackle nonlinear manifold, as the SwissRoll that was presented in my last item.

### Dimensionality reduction: Isomap

Isomap is one of the “oldest” tools for dimensionality reduction. It aims at reproducing geodesic distances (geodesic distances are a property of Riemanian manifolds) on the manifold in an Euclidiean space.

To compute the approximated geodesic distances, a graph is created, an edge linking two close points (K-neighboors or Parzen windows can be used to choose the closest points) with its weight being the Euclidean distance between them. Then, a square matrix is computed with the shortest path between two points with a Dijkstra or Floyd-Warshall algorithm. This follows some distance and Riemanian manifolds properties. The number of points is generally chosen based on the estimated distance on the manifold.

Finally, an classical MDS procedure is performed to get a set of coordinates.

### More on manifold learning

I hope to present here some result in February, but I’ll expose what I’ve implemented so far :

• Isomap
• LLE
• Laplacian Eigenmaps
• Hessian Eigenmaps
• Diffusion Maps (in fact a variation of Laplacian Eigenmaps)
• Curvilinear Component Analysis (the reduction part)
• NonLinear Mapping (Sammon)
• My own technique (reduction, regression and projection)
• PCA (usual reduction, but robust projection with an a priori term)

The results I will show here are mainly reduction comparison between the techniques, knowing that each technique has a specific field of application : LLE is not made to respect the geodesic distances, Isomap, NLM and my technique are.

### Manifold learning toolbox for Python

As I approach the end of my PhD, I will propose my manifold learning code in a scikit (see this page) in a few weeks. For the moment, I don’t know which scikit will be used, but stay put…

The content of the scikit will be :

• Isomap
• LLE
• Laplacian eigenmaps
• Diffusion maps