Author Topic: Fixed Point Logarithm (Read 5477 times)

Xeda112358 · « **on:** May 20, 2013, 08:49:55 am »

Hey everyone, I wrote a routine for computing fixed point 8.8 natural logs. The issue is that it is huge and slow, taking about 11000 t-states and being 188 bytes for the routine plus the 8.8 FP division routine.

I am sure there are optimisations to make-- I know a few that will increase the size, but speed it up a tiny bit. But I also wonder if there is an even faster algorithm? Here is a breakdown of what my code does:

First, I use part of a continued fraction. I tried three methods-- a Maclaurin Series (the most computationally expensive), a series I derived yesterday (slightly less computationally expensive), and continued fractions. The continued fractions converges faster than the other two and requires less than half the number of Multiplication/Division calls.
The very first step, is testing if the input is zero, in which case it returns $FFFF
Next, it puts the number in the range of [1,2] by multiplying it by a power of 2 (that power is either negative or positive or zero). It stores the power for later use.
Next it performs the following computations in BASIC-ish pseudocode (for the continued fraction)

Code: [Select]

     X-1→X
     push X
     4*X→X    ;in code, X is in HL, so I just use add hl,hl \ add hl,hl. This doesn't really count as a multiplication routine, then.
     Ans+4     ;In code, this is just inc h \ inc h \ inc h \ inc h
     X/Ans
     pop X
     Ans+3
     X/Ans
     Ans+2
     X/Ans
     Ans+1
     X/Ans

The next step is to take the power of two that the original value was multiplied by, multiply that power by ln(2) and add it to the result. If the power was negative (so it was divided by a power of two), I store ln(2) in 8.16 FP format and add it a number of times to an accumulator , then add it to the result. If the power is positive, I simply subtract the 8.8 approximation of ln(2) from the result the number of times needed.

Now for the actual code:

Code: [Select]

FPLog88:
;Input:
;     HL is the 8.8 Fixed Point input. H is the integer part, L is the fractional part.
;Output:
;     HL is the natural log of the input, in 8.8 Fixed Point format.
     ld a,h
     or l
     dec hl
     ret z
     inc hl
     push hl
     ld b,15
     add hl,hl
     jr c,$+4
     djnz $-3
     ld a,b
     sub 8
     jr nc,$+4
     neg
     ld b,a
     pop hl
     push af
     jr nz,lnx
     jr nc,$+7
     add hl,hl
     djnz $-1
     jr lnx
     sra h
     rr l
     djnz $-4
lnx:
     dec h        ;subtract 1 so that we are doing ln((x-1)+1) = ln(x)
     ld d,h
     ld e,l
     inc h
     call FPDE_Div_HL  ;preserves DE, returns AHL as the 16.8 result
     inc h        ;now we are doing x/(3+Ans)
     inc h
     inc h
     call FPDE_Div_HL
     inc h        ;now we are doing x/(2+Ans)
     inc h
     call FPDE_Div_HL
     inc h        ;now we are doing x/(1+Ans)
     call FPDE_Div_HL  ;now it is computed to pretty decent accuracy
     pop af       ;the power of 2 that we divided the initial input by
     ret z        ;if it was 0, we don't need to add/subtract anything else
     ld b,a
     jr c,SubtLn2
     push hl
     xor a
     ld de,$B172  ;this is approximately ln(2) in 0.16 FP format
     ld h,a
     ld l,a
     add hl,de
     jr nc,$+3
     inc a
     djnz $-4
     pop de
     rl l         ;returns c flag if we need to round up
     ld l,h
     ld h,a
     jr nc,$+3
     inc hl
     add hl,de
     ret
SubtLn2:
     ld de,$00B1
       or a
       sbc hl,de
       djnz $-3
     ret


FPDE_Div_HL:
;Inputs:
;     DE,HL are 8.8 Fixed Point numbers
;Outputs:
;     DE is preserved
;     AHL is the 16.8 Fixed Point result (rounded to the least significant bit)
     di
     push de
     ld b,h
     ld c,l
     ld a,16
     ld hl,0
Loop1:
     sla e
     rl d
     adc hl,hl
     jr nc,$+8
     or a
     sbc hl,bc
     jp incE
     sbc hl,bc
     jr c,$+5
incE:
     inc e
     jr $+3
     add hl,bc
     dec a
     jr nz,Loop1
     ex af,af'
     ld a,8
Loop2:
     ex af,af'
     sla e
     rl d
     rla
     ex af,af'
     add hl,hl
     jr nc,$+8
     or a
     sbc hl,bc
     jp incE_2
     sbc hl,bc
     jr c,$+5
incE_2:
     inc e
     jr $+3
     add hl,bc
     dec a
     jr nz,Loop2
;round
     ex af,af'
     add hl,hl
     jr c,$+6
     sbc hl,de
     jr c,$+9
     inc e
     jr nz,$+6
     inc d
     jr nz,$+3
     inc a
     ex de,hl
     pop de
     ret

On average this is off from the actual value by about 1/400. Since this is smaller than the smallest 8.8 value, it is pretty accurate. The worst case scenario that I have found so far was trying to find ln(1/256) which was off by about 7/512 from the actual value.

Later I will hopefully be working on sin() and cos() among others, but I thought I would share this and see if anybody had better methods.

EDIT: Thought of a fairly simple optimisation. I am going to test another optimisation that may speed it up by as much as 40%
EDIT2: Here is a much smaller and more efficient log₂ routine. It is 71 bytes (so 117 bytes smaller) and takes <4000 t-states:
EDIT3: I found a handful of bugs (errors in translating from hex to assembly) so I went through the code and fixed it. I also optimised a few areas, saving a total of 1 byte and some clock cycles. Also, a legitimate use of 'rr a' instead of 'rra'. I only added that for ambiguity since I only needed to check if A>1. In a later post, I have a size optimised version that averages about 39 t-states slower which is about 1% slower.

Code: [Select]

Log_2_88:
;Inputs:
;     HL is an unsigned 8.8 fixed point number.
;Outputs:
;     HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
;     pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;70 bytes
     ex de,hl
     ld hl,0
     ld a,d
     ld c,8
     or a
     jr z,DE_lessthan_1
     srl d
     jr z,logloop-1
     inc l
     rr e
     jp $-7
DE_lessthan_1:
     ld a,e
     dec hl
     or a
     ret z
     inc l
     dec l
     add a,a
     jr nc,$-2
     ld e,a

     inc d
logloop:
     add hl,hl
     push hl
     ld h,d
     ld l,e
     ld a,e
     ld b,7

     add hl,hl
     rla
     jr nc,$+3
       add hl,de
     djnz $-5

     adc a,0
     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     ld e,h
     ld d,a
     pop hl
     rr a
     jr z,$+7
       srl d
       rr e
       inc l
     dec c
     jr nz,logloop
     ret

tr1p1ea · « **Reply #1 on:** May 23, 2013, 04:47:35 pm »

Wow nice! Im not sure ive seen anyone write such a routine before?!

utz · « **Reply #2 on:** May 23, 2013, 07:06:23 pm »

Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.

Matrefeytontias · « **Reply #3 on:** May 24, 2013, 01:42:35 am »

I can't really tell, since because of my school level, I even don't know what a logarithm is

but I wonder what'd be the use of your routine ..?

ElementCoder · « **Reply #4 on:** May 24, 2013, 02:18:53 am »

Logarithms are used to find out the power you should raise a certian number to to get another number. The natural logarithm (used here) is a base e logarithm, ln(x) is asking to what power should I raise e to get x. Hope you got it

Streetwalrus · « **Reply #5 on:** May 24, 2013, 07:02:45 am »

In equation form, ln(a) is the solution to e^x=a, which is the definition we got in Terminale. BTW e is Euler's constant which is so that e^x=exp(x) where exp is the exponential function.

Matrefeytontias · « **Reply #6 on:** May 25, 2013, 04:37:10 am »

Okay, I got it, thanks

but I still don't really see what could be the use of this routine in a program.

DrDnar · « **Reply #7 on:** May 25, 2013, 05:06:16 pm »

I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.

Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.

Xeda112358 · « **Reply #8 on:** May 31, 2013, 06:55:30 am »

Quote from: utz on May 23, 2013, 07:06:23 pm

Very nice. A minor optimization: in line 109 you can use rla instead of rl a, since you're not checking that zero flag anywhere.

Oh, oops, I copied that from my calc incorrectly, thanks!

Quote from: DrDnar on May 25, 2013, 05:06:16 pm

I'd just create two look-up tables: one for 0<n<1 and another for 1<n<255. Then, branch depending on the MSB of n. Actually, if you compute a reciprocal, you only need one table. Basically, your input really only has 8 significant bits; surely that can be used for an optimization. Your output domain is -5.6<n<5.6.

Edit: Use change-of-base rule: It's trivial to compute n-log-base-2, so just compute that then divide by the hardcoded log-base-2 of e value.

To your first point, you would only really need to look at the interval [1,2] which is all my code does. Essentially, it computes ln(x) by taking x=y2ⁿ where y is on [1,2], then ln(y2ⁿ) = ln(y)+ln(2ⁿ)= ln(y)+n*ln(2) and I have ln(2) hardcoded.

That being said, what you put in your edit makes a lot of sense. Shortly after this routine, I wrote a log₂ routine that is much faster, and because I would be dividing by a constant log₂(e) to change the base to natural log, the division routine could be optimised a bit, saving ~400 cycles. Also, the log₂ routine is much better because I can extend it to just about any size fixed-point (or floating point) numbers that I want. I'll edit my first post with that.

Xeda112358 · « **Reply #9 on:** June 30, 2013, 06:21:22 pm »

I found a few errors in the log_2 routine (translation issues from hex->assembly). As well, I rearranged the code and in all saved a byte and some t-states

I updated the first post with my preferred version which is slightly speed optimised in the same way as the last version. Here is a cleaner, smaller version and is only slightly slower (about 39 t-states on average):

Code: [Select]

Log_2_88_size:
;Inputs:
;     HL is an unsigned 8.8 fixed point number.
;Outputs:
;     HL is the signed 8.8 fixed point value of log base 2 of the input.
;Example:
;     pass HL = 3.0, returns 1.58203125 (actual is ~1.584962501...)
;averages about 39 t-states slower than original
;62 bytes
     ex de,hl
     ld hl,0
     ld a,d
     ld c,8
     or a
     jr z,DE_lessthan_1
     srl d
     jr z,logloop-1
     inc l
     rr e
     jr $-7
DE_lessthan_1:
     ld a,e
     dec hl
     or a
     ret z
     inc l
     dec l
     add a,a
     jr nc,$-2
     ld e,a

     inc d
logloop:
     add hl,hl
     push hl
     ld h,d
     ld l,e
     ld a,e
     ld b,8

     add hl,hl
     rla
     jr nc,$+5
       add hl,de
       adc a,0
     djnz $-7

     ld e,h
     ld d,a
     pop hl
     rr a           ;this is right >_>
     jr z,$+7
       srl d
       rr e
       inc l
     dec c
     jr nz,logloop
     ret

Also, I used 'rr a' here for the flags to test if a>1 (the c flag is reset before hand, but I could have used sra a or srl a or bit 1,a or bit 1,d).

Author Topic: Fixed Point Logarithm (Read 5477 times)

Xeda112358

Fixed Point Logarithm

tr1p1ea

Re: Fixed Point Logarithm

utz

Re: Fixed Point Logarithm

Matrefeytontias

Re: Fixed Point Logarithm

ElementCoder

Re: Fixed Point Logarithm

Streetwalrus

Re: Fixed Point Logarithm

Matrefeytontias

Re: Fixed Point Logarithm

DrDnar

Re: Fixed Point Logarithm

Xeda112358

Re: Fixed Point Logarithm

Xeda112358

Re: Fixed Point Logarithm