My favorite design pattern in Python

I’ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the registry. How does it work? A registry is a list/dictionary/… of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate object for one’s purpose.

Continue reading My favorite design pattern in Python

Dimensionality reduction: comparison of different methods

I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well-known techniques.

First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Frobenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Continue reading Dimensionality reduction: comparison of different methods

Book review: Tools and environments for parallel and distributed computing

After Advanced Computer Architecture and Parallel Processing, I’m going to review another book from the same serie. As the title hints it, the goal of this book is to introduce the tools that may be used in parallel, grid and distributed computing. This is the layer above the architecture the last book presented.
Continue reading Book review: Tools and environments for parallel and distributed computing

Book review: Advanced Computer Architecture and Parallel Processing

This is my first review. I read this book some time ago but I still want to write about it because the topic is very interesting.
Continue reading Book review: Advanced Computer Architecture and Parallel Processing

Parallel computing in large-scale applications

In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.

Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.

Continue reading Parallel computing in large-scale applications

Dimensionality reduction: similarities graph and its use

Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.

It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.

Continue reading Dimensionality reduction: similarities graph and its use

Dimensionality reduction: explicit optimization of a cost function

Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.

Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.

Continue reading Dimensionality reduction: explicit optimization of a cost function