Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.
It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.
The Laplacian Eigenmaps are the most known technique using the similarity graph (safe for LLE, which is nothing more than a special case of the Laplacian Eigenmaps). The similarities are computed between neighboors (neighboors meaning the samples that are near one from another in a distance way or samples that are close, like pixels in an image), generally with a Gaussian kernel. The trick here is to choose the correct width of the kernel. Then, the similarities matrix is weighted (each column and line must sum to one, this is the Laplacian of the graph) and then eigenvectors are extracted from it. The first eigenvalue is one and must not be used.
Here is what I get :
One may wonder why the reduction is so poor, but I’m not the only one to get this result. I tried every width for the kernel to no avail. The literature says that Laplacian Eigenmaps tendto cluster points, which is easily explained by the algorithm. The eigenproblem extracts the main eigenvectors so that the weighted similarities matrix is preserved (in a quadratic way). This means that even if points should be close, if they are not close enough, they have a similarity of 0 so the eigenproblem will separate them.
Diffusion maps are another similarity graph technique. Although there is a Markovian/probabilistic interpretation, diffusion maps are basically Laplacian Eigenmaps with similarities computed between every pair of points. This means that they have the same drawbacks that Laplacian Eigenmaps except for the clustering. The width of the kernel is still difficult to estimate.
Here is the result :
The fact that every similarity is used explains the fact that diffusion maps cannot reduce the SwissRoll correctly. In this precise case, the kernel width was obviously too big, but smaller width gives a result similar to the Laplacian Eigenmaps, which is not correct either.
The other technique I will present is Hessian Eigenmaps. Instead of estimating the Laplacian of the similarities graph, it tries to estimate the Hessian. This gives very good result for the SwissRoll :
Unfortunately, the technique is not robust to noise, as I will show you in the result ticket. Safe for this fact, the technique is robust to holes in the manifold (not uniformly sampled manifolds for instance), which is one of the biggest drawback in techniques based on the geodesic distances.