Two months ago, my last post was on regression. I’d like to start this new year with some videos on how my algorithms behave.
Continue reading Dimensionality reduction: videos in regression algorithms
Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:
- you can interpolate the data in the original space based on an interpolation in the reduced space, or
- you create an approximation of the mapping with a multidimensional function (B-splines, …)
When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.
The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).
Continue reading Dimensionality reduction: mapping the reduced space into the original space
After some general books on grid computation, I needed to change the subject of my readings a little bit. As Intel Threading Building Blocks always intrigued me, I chose the associated book.
Continue reading Book review: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism
My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..
I’ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, …) as well as the expoed interface to enhance the scikit.
If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.
Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold…
Tutorial and the learn scikit mainpage.
Peer-to-peer. These words are unleashing in France a fight between the legislators and the developers. And this old – I say old because it was written in 2001, and 7 years is old for a book on this topic – book presented me the issues debated in journals, blogs, … in a new way.
Continue reading Book review: Peer-to-Peer: Harnessing the Power of Disruptive Technologies
At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that’s why I waited before publishing this post).
The journal is free, so you won’t have to pay to read it : Access to the EURASIP JASP article
I will publish additional figures here in a short time. The scikit is almost completed as well, I’m finishing the online tutorial for those who are interested in using it and/or enhancing it.
Today, I’m publishing a tutorial on two C++ profilers on my French website. The question I’m asking myself and you is: should I translate it ?
If some of you are interested in my French tutorials, I may translate them from time to time, depending on their content (I don’t want to translate an article on Boost for instance, the documentation does provide everything). But I’ll do that only if people tell me “Go on”. So I’m all ears…
Since the beginning of this year, I was trying to figure out what to do in my future. I’m still doing my PhD, but what could I do after that ?
My current job is to find a model for datasets.
A lot of datasets can be explained by a small number of parameters. For instance identity photos of a single person can be explained by 3 translations and 3 rotations. So my algorithms did that: find the parameters (or something that is close enough) and create a mapping between the parameters and the original space.
During this research, I learnt what is scientific computing. I did not explore everything in this field, but I covered the basics. That’s where I found about Python, but also C++ (which is the first language I really used). My thirst for information lead me to read a lot of books on several matters (architectural design, process, but also parallel computing and its different flavors). This led me to search for a job that would interest me the most.
So starting from September I’ll move to Pau, a town in the South of France. This is where the biggest research center of Total S.A. is located. I will work on oil exploration.
Although the theory behind this are well known (acoustic wave propagation and inverse problem), this does not mean that research in this field is over. For instance, the power needed for solving these problems are enormous. So their implementation must be well thought. And even if you managed to find a solution to your problem, you are not done. Total’s goal is not to be able to see if acoustic waves propagate fast in some places and slowly in others. Its goal is to find oil and gas. So now that one has an acoustic model, one must see with the geologists if there are some odds that there is oil or gas. And that’s also a big interesting challenge.
For those who were interested in manifold learning, don’t worry, I’m not finished in exposing my research. I will go on with some new posts about the mapping between the two spaces and how it can be used to test new samples. The scikit is now almost available. I still have to finish the tutorial and test if everything is OK.
I hope I will be able to continue with other subjects on this blog, there is no reason I cannot do this. Although what I’ll be doing at Total is secret, there are a lot of fields I’d like to talk about.
I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don’t usually play with.
Continue reading Book review: From P2P to Web Services and Grids: Peers In A Client/Server World
I’ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the registry. How does it work? A registry is a list/dictionary/… of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate object for one’s purpose.
Continue reading My favorite design pattern in Python