In release 2.2.0, ATK gained new EQ filters that are vectorized. These filters cannot be used to filter different bands from the same input signal (yet), but they can be used to filter in the same way several channels.
The first file has 5 test cases. Four of them are using a vectorized DF1 with different inputs. The fifth one is the TDF2 equivalent of one of these cases. The second file is a fully vectorized DF1 and a TDF2 test cases. It uses a dispatcher to select the best platform according to the CPU it runs on.
So the question is: which one of these is the fastest?
I ran the application on Linux through valgrind, after compiling it with gcc 7 and no specific instruction set. Here are the results:
I’ve only selected the process calls, and it is obvious the timings are dominated by two things:
- The conversion of the individual signals to the SIMD signal
- The actual EQ processing
Let’s compare first the non SIMD versions. The DF1 versions spend 13.9 million cycles, when the TDF2 only 3.6, but for only for one channel, so that’s 14.4 million cycles. The DF1 has dedicated SIMD lines that makes it faster that the TDF2 version.
On the SIMD side, things are different. The DF1 version is almost twice as slow as the TDF2 version, and the TDF2 version itself is only slightly slower than the non SIMD version when taking the conversion times into account (there is probably things I need to do to optimize it there!).
When using AVX2 for the non SIMD filters, some get faster:
The SIMD filters were not supposed to get faster, their code is still exactly the same. The DF1 non SIMD is now 20% faster and TDF2 stays at the same.
The conclusions are simple: SIMD TDF2 is good but the framework around still need to get better. The non SIMD TDF2 filter will require care so that they get faster. By making this one faster, perhaps I’ll may be able to make the SIMD version also faster!
There are more and more SIMD filters in ATK, let me know what you think of this effort.