I’ve explained in earlier posts how to simulate a simple overdrive circuit. I’ve also explained how I implemented this in QtVST (and yes, I should have added labels on those images!), which was more or less the predecessor of Audio TK.
The main problem with simulating non linear circuits is that it costs a lot. Let’s see if I can improve the timings a little bit.
The state at version 0.1
In Audio Toolkit 0.1, I’ve implemented the same version than what is available in the QtVST plugin. Let’s check on Valgrind with a 32x oversampling what actually costs the most:

The obvious culprit is the exponential function due to the diode behavior. And indeed, in the cost function that the Newton Raphson algorithm optimizes, I have two calls to the exponential function:
std::pair<DataType, DataType> operator()(DataType x0, DataType x1, DataType y0, DataType y1) { DataType expdiode_y1_p = std::exp(y1 / vt); DataType expdiode_y1_m = 1/expdiode_y1_p; DataType expdiode_y0_p = std::exp(y0 / vt); DataType expdiode_y0_m = 1/expdiode_y0_p; std::pair<DataType, DataType> diode = std::make_pair(is * (expdiode_y1_p - expdiode_y1_m), is * (expdiode_y1_p + expdiode_y1_m) / vt); return std::make_pair(A * diode.first + (y1 + (x0 - x1 + B * is * (expdiode_y0_p - expdiode_y0_m) - y0)), A * diode.second + 1); }
Improving the overdrive
Knowing how the Newton Raphson algorithm works, y0 is actually constant for during each optimization, and then y0 will be the old y1, the old state. At the beginning of each iteration, y0 will be equal to y1, and I tried this solution. It was worth the cost, as the Valgrind was almost the same (or worse).
So now the code looks like this:
std::pair<DataType, DataType> operator()(DataType x0, DataType x1, DataType y0, DataType y1) { DataType expdiode_y1_p = std::exp(y1 / vt); DataType expdiode_y1_m = 1 / expdiode_y1_p; DataType expdiode_y0_p; DataType expdiode_y0_m; if(y0 == oldy0) { expdiode_y0_p = oldexpy0; expdiode_y0_m = oldinvexpy0; } else if(y0 == oldy1) { expdiode_y0_p = oldexpy1; expdiode_y0_m = oldinvexpy1; } else { expdiode_y0_p = std::exp(y0 / vt); expdiode_y0_m = 1 / expdiode_y0_p; } oldy0 = y0; oldexpy0 = expdiode_y0_p; oldinvexpy0 = expdiode_y0_m; oldy1 = y1; oldexpy1 = expdiode_y1_p; oldinvexpy1 = expdiode_y1_m; std::pair<DataType, DataType> diode = std::make_pair(is * (expdiode_y1_p - expdiode_y1_m), is * (expdiode_y1_p + expdiode_y1_m) / vt); return std::make_pair(A * diode.first + (y1 + (x0 - x1 + B * is * (expdiode_y0_p - expdiode_y0_m) - y0)), A * diode.second + 1);
The resulting Valgrind profile is displayed here:

Can we do better?
So the situation is better, but not yet ideal. The performance improved by 10% (more or less), but the exp function is still using one third of the computation time. One way of improving the situation is to use an approximation of the exponential function. The first thing to do is to know the range on which the function will be needed. The diode maximum voltage is about 0.6V, meaning that the exponent goes from -26 to +26 (taking Vt into account).
Then, I tried two options to approximate the exponent function: Taylor expansion and the Pade approximants. Here are the two functions written in Python, using Horner’s method:
def approx_exp(d, order): result = 0 for i in range(order, 0, -1): result = (1 + d * result) / i return d * result + 1 def pade_approx(d, m, n): num = 0 for i in range(m, 0, -1): num = (m-i+1) / (m + n - (i-1.)) / i * (1 + d * num) num = (1 + d * num) denom = 0 for i in range(n, 0, -1): denom = (n-i+1) / (m + n - (i-1.)) / i * (1 - d * denom) denom = (1 - d * denom) return num / denom
To get good positive values, I had to use an order of at least 35 for Taylor’s expansion, and (20, 20) for Pade approximants. Pade approximants were not efficient, as they require more multiplications and more divisions, but even Taylor’s expansion required several times more CPU time than the exponential function.
Another solution was to try the Taylor expansion with a handful coefficients, but instead of going up to 26, I went to 26/16, and then multiplied the exponential result by itself several times. But it wasn’t enough to make a difference, the timings were just a little bit above the current one I had with the exponential function. So I’m staying with it for the moment.
Still, if you want to test an approximation of the exponential function, you should only use positive values, and if you have negative exponents, inverse the result of the positive value instead: it is less prone to oscillations.
Which oversampling to choose?
Obviously, the only remaining solution I have it diminishing the oversampling. But then, the solution quality may decrease. So what is the difference?



As the 32x oversampling should have the least aliasing, I used it as a reference for the others. Adding one level of oversampling decreases the error by almost one level of magnitude, which is good. But if you consider than less than 1% error is enough, a oversampling of 4x may be what you are looking for for a 1kHz signal. Don’t forget that the higher the frequencies in the original signal, the more aliasing content you will get.
Conclusion
Yes, we can do better than the plain Newton Raphson optimization by caching values. Using an exponential approximation doesn’t seem to be efficient enough, even with the proper multiplication tools, so the only solution left is reducing the quality of the solution. 32x is not perfect, there is still some aliasing effects, as I showed in the QtVST post, but is should be enough for the majority of the signals. 4x might also if you accept a 1% level error.
If you’re trying to approximate an exp over a known range, have you tried storing pre-computed values of E[k]:=exp(a.k) for k=1…N and for every x, taking k(x) = int(x/a) and exp(x) = E[k(x)].exp(1+x/a.k(x)) and approximating the latter term with a lower-degree Taylor?
Of course, every math lib probably already does that so it’s unlikely you’ll see performance gains, but if you’re willing to store enough E[k] data points and lose enough precision, you could go for exp(x) ~ E[k] (1+x/ak) which is only 1 div and 1 mult.
No, I didn’t try this approach. Indeed, one div and one mult is less than the 7 or 8 mult I was using in the Taylor expansion. I may try it later, it will depend on the needed size for the LUT!
Thanks!