I made a neat little algorithm today that seems to do an okay job at approximating sine. My original intention was to create a routine for the following with a fixed point format where x is on [0,1)
x(1-x)
However, if 'x' is in 'a', i can just do a*(-a), but
NEG is 2 bytes, 8 cycles, whereas
CPL is 1 byte, 4 cycles. That made me wonder if I could make a fast routine to multiply A by its compliment (is it called 1s compliment?). After working in my notebook with some more abstract approaches (I used 7th degree polynomials) I reduced it to a fairly simple solution that would be much simpler to integrate into an IC or something. However, I decided to experiment a little with the result by seeing what happens if I don't let some bits to propagate into the upper 8 bits (remember, 8.8 fixed point). So basically, I computed the product of 7-th degree polynomials with binary coefficients, and then truncated any terms of degree 7 or lower, then I converted it back to binary and the result is a close approximation of x(1-x). This happens to be a close approximation of sine, too, so after some tweaking, I got it to have the same input/output range as Axe (I recall that Quigibo said he used x(1-x) as an approximation to sine, too).
The following uses 3 iterations as opposed to 8 and is faster than the Axe version, but larger by 8 (?) bytes:
p_Cos:
ld a,l
add a,64
ld l,a
p_Sin:
ld a,l
add a,a
push af
ld d,a
ld h,a
ld l,0
ld bc,037Fh
__SinLoop:
sla h
sbc a,a
xor d
and c
add a,l
ld l,a
rrc d
srl c
srl c
djnz __SinLoop
ld h,b
pop af
ret nc
xor a
sub l
ret z
ld l,a
dec h
ret
;This:
; 34 bytes
; 269 t-states min, else 282, else 294
; avg. 76 t-states faster than Axe
;Axe:
; 27 bytes
; 341+b t-states min, else 354, else 366
If the bits of register 'l' are sabcdefg, this basically returns:
(0aaaaaaa^0bcdefg0)+(000bbbbb^000cdefg)+(00000ccc^00000def)
^ is for XOR logic, + is regular integer addition modulo 256
(s is the sign)
The original algorithm was going to be for computing the natural logarithm
I came up with some new (to me) algorithms for that as well.
EDIT1: Thanks to calc84maniac for pointing out the optimisation using a different order of operations with xor/and ! Saved 1 byte, 12 cycles. This let me shuffle some code to save 8 more cycles.
EDIT2: Modified the last note to reflect the normalized input/output to match that of Axe.