OverviewI was reading the BKM algorithm and I thought it was similar to my previous attempt at making an algorithm for the complex exponential function. However, I tried implementing it and failed miserably. So I altered my previous algorithm with an idea from the BKM algorithm, and I made my own algo that works as quickly, uses a smaller LUT, works on a wider range of values, and has a more straightforward implementation.
Why is the complex exponential function useful?The complex exponential function can be evaluated as
##e^{x+iy}=e^{x}\left(\cos(y)+i \sin(y)\right)##. So for people who actually need a complex exponential function, this is useful, but it can also compute the real exponential function(take y=0), or real sine and cosine, simultaneously (take x=0).
Goals of the algorithmAlthough I only advertise my algorithm as working on the rectangle [-1,1]+[-1,1]i, it actually works on approximately (-1.754365792,1.754365792)+(-1.213427912,1.213427912)i. Feel free to use this to your advantage.
As well, since shifting and adding are easy
citation needed, this is a shift and add algorithm. It takes approximately n iterations for n bits of accuracy, and 8 tables of n/2 floats (versus 18 tables suggested by the BKM algorithm).
The AlgorithmWe will first give simple pseudo code, then proceed to optimize it.
As in my old algorithm and the BKM algorithm, we start by observing that e
z-s=e
ze
-s=e
z/e
s. So then if z-a-b-c-d-...=0, then we have
ez-a-b-c-...=e
z/(eaebec...), so then 1=ez/(eaebec...), so eaebec...=ez.
The trick is finding values for a,b,c,... so that the multiplications are "easy" as in, comprised of a shift and add.
Since we are working with complex numbers, take the complex number d where the real and imaginary parts are in the set {-1,0,1}. Multiplication by those values is even more trivial than a shift and add. What we will do is take ea=1+d2-n. Multiplication by 2-n is a right shift by n.If ea=1+d2-n, then log(ea)=log(1+
d2
-n), so a=log(1+
d2
-n). There are 9 possible values for
d, so you might think, "oh no Zeda, we need 9 tables of complex numbers which is basically 18 tables of floats," but we all know that math+programming=magic
citation not needed. So let's look at how the complex logarithm is defined and call the real part of d, s, and the imaginary part t. So d=s+t*i. By the way, references to "log" imply the natural logarithm in the math world, so log
e or ln() not log
10.
##\log(x+iy) = \log(\sqrt(x^{2}+y^{2}))+i\tan^{-1}(y/x)= \frac{1}{2}\log(x^{2}+y^{2})+i\tan^{-1}(\frac{y}{x})## (side note: atan has infinitely many valid outputs-- just add 2*k*pi for integers k, but we are only interested in k=0, the principle Arg).
So our tables would be with x=1+
s2
-n, y=
t2
-n.
But there are lots of repeated values, so we
actually need tables for:
For our purposes, we will note that to choose s and t, we will use s=sign(iPart(Re(z)*2m+1)), t=sign(iPart(Im(z)*2m+1)) where iPart() returns the integer part of the input, and sign() returns 1 if positive, -1 if negative, and 0 if 0. Without further ado, the algorithm:
1.0→v
1.0→w
For k,0,n-1
sign(x)→s
sign(y)→t
if |x|<2^(1-k)
0→s
if |y|<2^(1-k)
0→t
if s≠0 & t≠0
v→a
if t=0
x+log(1+s2^-k)→x
v+s*v>>k→v
if s=0
x+log(1+4^-k)→x
y+t*atan(2^-k)→y
v-tw>>k→v
w+ta>>k→w
if s=-1
x+log((1-2^-k)^2+4^-k)→x
y+t*atan(1/(2^k-1))→y
v-v>>k-tw>>k→w
w-w>>k+ta>>k→v
if s=1
x+log((1+2^-k)^2+4^-k)→x
y+t*atan(1/(2^k+1))→y
v+v>>k-tw>>k→w
w+w>>k+ta>>k→v
The result is v+iw to n bits of accuracy. It seems complicated, but it is basically just taking care of each case for s and t. As well, we can multiply x, y by 2 each iteration so instead of checking the nth bit, we can check if they are less than 1. Then we would need to multiply each table entry by 2k+1. For further optimization, we note that to 2m bits of accuracy, when k>m-1, log(1+d2-(k+1))=.5*log(1+d2-k) So we can basically reuse the last table value, divide by 2 each iteration, then multiplying by 2 to get our 2-(k+1). But wait, x/2*2=x, so we can directly use the last value in each table!
To clean things up a bit, lets name the tables and use an index to address each value:
For the optimized code to 2m bits of accuracy,1.0→v
1.0→w
For k,0,2m-1
if k<m
k→n
x<<1→x
y<<1→y
sign(x)→s
sign(y)→t
if |x|<1
0→s
if |y|<1
0→t
v→a
if s=0
if t=-1
x+table6[n]→x
y-table7[n]→y
v+w>>k→v
w-a>>k→w
if t=1
x+table6[n]→x
y+table7[n]→y
v-w>>k→w
w+a>>k→v
if s=-1
if t=-1
x+table0[n]→x
y-table2[n]→y
v-v>>k+w>>k→w
w-w>>k-a>>k→v
if t=0
x+table4[n]→x
v-v>>k→w
w-w>>k→v
if t=1
x+table0[n]→x
y+table2[n]→y
v-v>>k-w>>k→w
w-w>>k+a>>k→v
if s=1
if t=-1
x+table1[n]→x
y-table3[n]→y
v+v>>k+w>>k→w
w+w>>k-a>>k→v
if t=0
x+table1[n]→x
v+v>>k→w
w+w>>k→v
if t=1
x+table1[n]→x
y+table3[n]→y
v+v>>k-w>>k→w
w+w>>k+a>>k→v
Summary:For floating point math with floats in base 2, arbitrary shifting is trivial (just change the exponent). So each iteration requires at most 6 trivial shifts, 6 trivial compares, 6 non-trivial additions/subtractions. So for 2m bits of precison, the total speed cost is 12mA+epsilon, where A is the speed of an addition or subtraction, and epsilon is a small, positive number. The total space cost is 8m Floats. Compared to the BKM algorithm, the speed is the same (with just a change in epsilon), but the BKM uses 18m Floats, so we save 10 Floats. We also work on a larger space than the BKM algorithm which works on a trapezoid region of the complex pain, entirely contained in the set (a proper subset) on which this algorithm works.
Future PlansI plan to write this up more formally, including the proofs as to why we can use
m elements for 2m bits of accuracy, why we can just look at the sign and the nth bits of the real and complex component of the input to choose our
d. This last bit was a difficult task for the creator of the BKM algorithm and in their paper, they mentioned that the bounds were theoretical and only as tight as
they could manage. In practice they found that the bounds were loose and they suspected that they could expand their bounds, but they couldn't prove it. I found an easy way to prove this for my algorithm which I suspect will apply to the BKM algorithm since at this point they are nearly identical.
Gosh I hope I don't have typos, formatting issues, or errors in the algorithms I posted. I have been typing and revising this for hours .__.