Yes, because Cover Trees are sometimes too slow. In fact, I asked myself this question, not for the build time, but for the search time if the data has a structure. Imagine, what would happen if your data was more a less a regular grid? When I tried that, starting with a point at (0,0), then (1,0)… the first node (0,0) had references to all the last points (9,9), (9,8)… And I figured, it would be slower than a tree search. So I decided to give kd-trees a shot for this kind of search on a regular grid.
Continue reading KdTree for nearest neighbors
I had to port a simplex/Nelder-Mead optimizer that I already have in Python in C++. As for the Python version, I tried to be as generic as possible but as efficient as possible, so the state is no longer a dictionary, but a simple structure.
I could have used the Numerical Recipes version, but the licence cost is not worth it, and the code is not generic enough, not explained enough. And also there are some design decisions that are questionable (one method = one responsibility).
Continue reading Just a small example of numerical optimization in C++
I’ve looked on github for a good C++ implementation of Cover Trees for nearest-neighbors search, but I didn’t find one. I may have overlooked some repositories, but in the end, implementing it myself wasn’t that difficult.
Continue reading Cover tree for nearest-neighbors
When faced with a new dataset, the issue is to find how it should be analyzed. A lot of books addresses the theoretical way of doing it, but this book gives practical clues to do it. Besides, it isn’t based on commercial tools like MATLAB, but on open source tools that can be freely downloaded on the Internet.
Continue reading Book review: Data Analysis with Open Source Tools
I’ve decided for once to read a novel about software. This book is about the story of Chandler, a piece of software that was a dream that didn’t quite came true.
Continue reading Fun book: Dreaming In Code
Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.
Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.
Continue reading Profiling with Valgrind/Callgrind
For my research, I had to create a set of smooth deformation fields where I knew which points were moved and by which amount.
I tried to find a script, but I couldn’t find an appropriate one, not even talking about one in Python. So here I propose my own version, allowing to interpolate a 1D, 2D or 3D deformation field based on some points.
How does it work ? It is based on Bookstein’s algorithm. The first step is the computation of the coefficients of the smooth deformation field and then they are used to compute the values on the deformation field on a grid and this grid is returned.
The function to use is denseDeformationFieldFromSparse(), the arguments being size, the size of the desired grid, points, the locations where the deformation field is known, and displacements, the amount of displacement for each previously given point.
This code is given as is, but feel free to comment so that bugs can be ironed out (if there are bugs). It was tested with 1D, 2D and 3D test cases which can also be found on the gist.
Thanks to Bill Baxter for the distance function that was proposed on the numpy discussion list.