It has been a while since my last post on manifold learning, and I still have some things to speak about (unfortunately, it will be the end post of the dimensionality reduction series on my blog, as my current job is not about this anymore). After the multidimensional regression, it is possible to use it to project new samples on the modelized manifold, and to classify data.
Projection
Finding the projection of a point on the manifold can be done by searching in the reduced space the point that will minimize some criterion. As the dimension of this space is small, this can be efficiently done.
Another way is to use the multidimensional regression. It consists of several linear models, so one can project the new sample on each model and choose the best one. This can lead to somewhat different projections than the search in the reduced space. Indeed, a projection on one model can in fact be reconstructed by another model. We did not do this, because we’ve added another way of ensuring that we are coosing a good projection. The reason is that to correctly project on each model, you have to track on which model you currently are, which means optimizing a function that is not smooth, which is more difficult and thus longer.
So we are using two cost functions, one is simply the Maximum Likelihood, so for a Gaussian random variable it is an orthogonal projection. But as figure 1 tells us, sometime, the projection is not on the manifold. So to enhance this result, we use a Maximum A Posteriori function, which adds a regularization term. It simply is a Gaussian Mixture in the reduced space, one Gaussian for each learning point labelled to the considered model. This way, the projection will be attracted to the subspace where the model is used to reconstruct points.

I’ve projected new samples of the SwissRoll and the SCurve with both methods. The SwissRoll shows that projecting points on models without a regularization feature or without tracking the correct model during the projection does not give valid results in the reduced space. In the original space, the points are correctly reconstructed. With the SCurve, the results are better.
As an other benchmark, I’ve taken the 20 datasets from the COIL-20 database. I’ve added an occlusion to each image, and I’ve projected the images with and without occlusion on the corresponding manifold (I’ve used a Laplacian random variable to correctly describe the noise made by the occlusion). I’ve compared the results to a robust projection on the 2 or 15 first eigenvectors of the datasets. First, without occlusions, the projection on 15 eigenvectors manages to reproduce the images, as does my method. Of course, only 2 eigenvectors is not enough to describe the dataset correctly (although I’m also using only 2 coordinates to describe them). With 40% of occlusion, my method gets on a par with the 15 eigenvectors, and even better if we consider only the reconstruction error. Indeed, my method yields better images than the 15 eigenvectors.
Classification
Now, if we use the 20 datasets as 20 manifolds and if we project all images on all manifolds and select the best projection for each image, we have a new of doing a classification. This leads to the following graphic representation of the confusion matrix (hotter colors indicate higher percentage).
![]() |
![]() |
Without occlusion, the confusion matrix is pretty good, but with even 10% occlusion, some test class samples are misclassified for an other class. This is because those classes are darker than the actual test sample class, and with an occlusion, these classes fit better the occlusions.
Now, for the last dataset, I’ve taken samples from the Outex database. Each texture image is cut is 16 samples, then one half of the samples is used as training, the other half as the test samples. Then, the sample are transformed in cooccurrence matrices.
![]() |
![]() |
There 72 textures, thus 72 classes. Here are the resulting confusion matrices for the training and test samples.
![]() |
![]() |
In this case, the results are better than the litterature (Generalization of the cooccurrence matrix for colour images : Application to colour texture classification, Image Analysis & Stereology, 2004).
The end
This is the last post I’ll be doing on manifold learning. It is very long because I wanted to write my last results (some can’t be found in the litterature), and I didn’t feel writing two or three posts. I’m not researching mainfold learning anymore, so it needed to be finished clearly.
I hope you enjoyed the different posts in this category. There is still much to do in the field, but one cannot do everything…
Hi,
I’m currently working on my Bachelor Thesis on dimensionality reduction. I’m creating a graphical user interface which enables easier parameters selection for LLE and isomaps. I create interactive (zoomable/moveable) 3d visualisations to explore the resulting mapping.
I think your plots are looking really nice, and I was wondering what you use to create them. Are they interactive?
Further, you were talking about a python library which includes dimensionality reduction algorithms, but I seem unable to find it anywhere. Did you publish it?
Then, I’m running into some problems with LLE for classification tasks. The LLE implementation I use discards all points that are not connected to the largest neighborhood, and most papers ignore this ‘feature’ of LLE. Did you consider this problem?
Thanks
– Okke Formsma
Hi Okke!
I use Matplotlib to create the graphics. They are interactive as Matplotlib is. For the 3D part, Matplotlib does not provide it anymore, although they will publish a new version soon (let’s hope so). You can use Mayavi for this purpose, but it isn’t as easy as Matplotlib, especially for the colormaps.
The Python library is available online in a scikit. The library itself is in the learn package, and you have to download the old openopt package to get the associated optimization framework. I have also written a tutorial.
For the problem of disconnected subgraphs, I don’t have a solution, I didn’t have the time to look at this issue. It depends on what you intend to do: are the datasets really different? In this case, you should make several different reductions, and if they are not different, you could add some links inside your matrix.
And thanks for the comment 😉