Just after the release of ATK SD1, I updated my audio toolkit. I added a few optimizations on overdrive computations, and also for the base filter array exchange.
It’s time for a new release of the toolkit. Much has been done in terms of basic filters, but also to simplify usage (static and shared libraries are compiled, no need to reset the pipeline before calling process…).
I’ve explained in earlier posts how to simulate a simple overdrive circuit. I’ve also explained how I implemented this in QtVST (and yes, I should have added labels on those images!), which was more or less the predecessor of Audio TK.
The main problem with simulating non linear circuits is that it costs a lot. Let’s see if I can improve the timings a little bit.
A long time ago, I started implementing audio filters with a Qt GUI. I also started other pet projects in the same area, but I didn’t have a proper audio support library in C++ for that. Also Qt plugins are no longer an option (for me), I still hope to implement new filters in the future.
Sometimes, it’s so easy to rewrite some existing code because it doesn’t fit exactly your bill.
I just so the example with an All To All communication that was written by hand. The goal was to share how many elements would be sent from one MPI process to another, and these elements were stored on one process in different structure instances, one for each MPI process. So in the end, you had n structures on each of the n MPI processes.
The MPI_Alltoall cannot map directly to this scattered structure, so it sounds fair to assume that using MPI_Isend and MPI_Irecv would be simpler to implement. The issue is that this pattern uses buffers on each process for each other process it will send values to or receive values from. A lot of MPI library allocate their buffer when needed, but will never let go of the memory until the end. So you end up with a memory consumption that doesn’t scale. In my case, when using more than 1000 cores, the MPI library uses more than 1GB per MPI process when it hits these calls, just for these additional hidden buffers. This is just no manageable.
Now, if you use MPI_Alltoall, two things happen:
- there are no additional buffer allocated, so this scales nicely when you increase the number of cores
- it is actually faster than your custom implementation
Now with MPI 3 standard having non-blocking collective operations, there is absolutely no reason to try to outsmart the library when you need a collective operation. It has heuristics when it knows that it is doing a collective call, so let them work. You won’t be smarter if you try, but you will if you use them.
In my case, the code to retrieve all values and store them in an intermediate buffer was smaller that the one with the Isend/Irecv.
It seems that Packt Publishing is on a publishing spree on Machine Learning in Python. After Building Machine Learning Systems In Python for which I was technical reviewer, Packt published Learning Scikit-Learn In Python last November.
I’m please to announce a new version for scikits.optimization. The main focus of this iteration was to finish usual unconstrained optimization algorithms.
- Fixes on the Simplex state implementation
- Added several Quasi-Newton steps (BFGS, rank 1 update…)
The scikit can be installed with pip/easy_install or downloaded from PyPI
It has been a while, too long for sure, since my last update on this scikit. I’m pleased to announce that some algorithms are finally fixed as well as some tests.
- Fixed Polytope/Simplex/Nelder-Mead
- Fixed the Quadratic Hessian helper class
Additional tutorials will be available in the next weeks.
Due to the end of the free lunch, manufacturers started to provide differents processing units and developers started to go parallel. It’s kind of back to the future, as accelerators existed before today (the x87 FPU started as a coprocessor, for instance). If those accelerators were integrated into the CPU, their instruction set were also.
Today’s accelerators are not there yet. The tools are not ready yet (code translators) and usual programming practices may not be adequate. All the ecosystem will evolve, accelerators will change (GPUs are the main trend, but they will be different in a few years), so what you will do today needs to be shaped with these changes in mind. How is it possible to do so? Is it even possible?
Continue reading Thinking of good practices when developing with accelerators