Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:

- you can interpolate the data in the original space based on an interpolation in the reduced space, or
- you create an approximation of the mapping with a multidimensional function (B-splines, …)

When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.

The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).

#### Position of the problem

In the literature, several papers were published in order to create a piecewise linear function. Their main advantage is that they can compute the reduced space when computing the mapping. Their issue is that the number of models is fixed at the beginning of the optimization. What I propose now is an adaptive number of models, depending on the manifold.

Each point has a set of neighbors, the k nearest ones are used here.

#### Basic approach

In fact, the basic algorithm is straightforward: put a new model where you can, optimize all models, label the points you can label, loop and stop at some point.

The first algorithm I used is the following:

- Start with no model.
- Find a point whose neighborhood is not labeled, if you can’t, stop.
- Create a new model there and label the points accordingly.
- Estimate all models with regards to the labels.
- Label each point to the nearest model if the point is too far (that is further than a factor times the mean error), do not label it.
- If one model has not enough points (twice the dimension of the reduced space), the model is deleted.
- Go back to step 2.

This algorithm is very simple, but cannot model precisely a manifold, you cannot tune it. But it can give a good first impression on the manifold.

Here are some steps of the algorithm for a SwissRoll (the left figure indicates the label in the reduced space and the left figure is the approximation of the manifold in the original space):

At the end of the iteration, the manifold is roughly approximated:

#### Maximize a likelihood

As I’ve said, there are no ways for the first method to get a given precision. So instead of adding models where it is available, the second algorithm adds models where the current function is the least likely. Here are the steps for this algorithm:

- Start with one model (every point is labeled to this model).
- Compute the likelihood of each point.
- Get the
*n*points for which the neighborhood is the least likely - Get one point from this set.
- Create a new model and label the point and its neighborhood to it.
- Update all plans.
- Update all labels (label a point to the most likely model).
- If one model has not enough points (twice the dimension of the reduced space), the model is deleted.
- Go to step 6 until the labels are stable.
- Go to step 2 if a criterion is not met (usually, I choose the Aike information criterion).

I’ve added later an additional step that asserts that the points assigned to a model are connected (that is the subgraph is connex). If it is not, the model is split in several parts, one for each connected component.

I didn’t state it before, but the error can be modeled by different random variables. I’ve chosen an isotropic Gaussian variable in every training.

There an immediate improvement of the algorithm, as it can be seen in the following figures:

Here, after the introduction of the second model, a third was introduced. Indeed, the second model took the middle part of the reduced space thus splitting the graph with the points of the first model in two components, thus a new model was added.

At the end of the global optimization, the result is the following one:

The differences between the two algorithms are obvious in the quality of the reconstruction. The first one had a 20% reconstruction error, instead of 3% for the second one. Although it is more complicated, it is more capable of optimizing the problem of finding linear models that will minimize the reconstruction error.

#### Coming next

After this training, a complete manifold model is available, with the precision one need. I’ll present how it can be used in the following posts. Stay tuned !

Pingback: Matt’s blog » Dimensionality reduction: videos in regression algorithms