Since the beginning of this year, I was trying to figure out what to do in my future. I’m still doing my PhD, but what could I do after that ?
My current job is to find a model for datasets.
A lot of datasets can be explained by a small number of parameters. For instance identity photos of a single person can be explained by 3 translations and 3 rotations. So my algorithms did that: find the parameters (or something that is close enough) and create a mapping between the parameters and the original space.
During this research, I learnt what is scientific computing. I did not explore everything in this field, but I covered the basics. That’s where I found about Python, but also C++ (which is the first language I really used). My thirst for information lead me to read a lot of books on several matters (architectural design, process, but also parallel computing and its different flavors). This led me to search for a job that would interest me the most.
So starting from September I’ll move to Pau, a town in the South of France. This is where the biggest research center of Total S.A. is located. I will work on oil exploration.
Although the theory behind this are well known (acoustic wave propagation and inverse problem), this does not mean that research in this field is over. For instance, the power needed for solving these problems are enormous. So their implementation must be well thought. And even if you managed to find a solution to your problem, you are not done. Total’s goal is not to be able to see if acoustic waves propagate fast in some places and slowly in others. Its goal is to find oil and gas. So now that one has an acoustic model, one must see with the geologists if there are some odds that there is oil or gas. And that’s also a big interesting challenge.
For those who were interested in manifold learning, don’t worry, I’m not finished in exposing my research. I will go on with some new posts about the mapping between the two spaces and how it can be used to test new samples. The scikit is now almost available. I still have to finish the tutorial and test if everything is OK.
I hope I will be able to continue with other subjects on this blog, there is no reason I cannot do this. Although what I’ll be doing at Total is secret, there are a lot of fields I’d like to talk about.
I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don’t usually play with.
Continue reading Book review: From P2P to Web Services and Grids: Peers In A Client/Server World
I’ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the registry. How does it work? A registry is a list/dictionary/… of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate object for one’s purpose.
Continue reading My favorite design pattern in Python
This book is different from the two last books I read. Indeed, it tackles a specific Python library, Twisted, and how to use it.
Continue reading Book review: Twisted: Network Programming Essentials
I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well-known techniques.
First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Frobenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Continue reading Dimensionality reduction: comparison of different methods
After Advanced Computer Architecture and Parallel Processing, I’m going to review another book from the same serie. As the title hints it, the goal of this book is to introduce the tools that may be used in parallel, grid and distributed computing. This is the layer above the architecture the last book presented.
Continue reading Book review: Tools and environments for parallel and distributed computing
This is my first review. I read this book some time ago but I still want to write about it because the topic is very interesting.
Continue reading Book review: Advanced Computer Architecture and Parallel Processing
In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.
Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.
Continue reading Parallel computing in large-scale applications
Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.
It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.
Continue reading Dimensionality reduction: similarities graph and its use
Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.
Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.
Continue reading Dimensionality reduction: explicit optimization of a cost function