Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Xeda112358

Pages: 1 ... 34 35 [36] 37 38 ... 317
526
Basically, a geometric mean routine would be used in the super-cool "Arithmetic-Geometric Mean" class of algorithms. What I was doing was making an efficient alternative to multiplying, then taking the square root. It is not more efficient in TI-BASIC, but it can be much more efficient in other applications. Plus, there is no overflow risk.

So as an example, you could do this on the homescreen:
Code: [Select]
{976,587→L1
1+L1(2)/L1(1:1-Ans^-1+.25Ans:L1{Ans,Ans^-1→L1
Just hit enter one or two more times and voila, you have the square root of 976*587.

527
So I've been developing a way to compute the geometric mean of two numbers (defined by ##\sqrt(xy)##) and my original algorithm had a few snazzy properties:
-It is an add-shift algorithm
-No multiplication needed and so no risk of overflow
-No explicit square root operation performed

I played with the math a bit, and if I take it out of the realm of add-shift routines, and allow division and multiplication, I came up with an algorithm that is relatively easy to do by hand, converges quarticly (quadruples the digits of accuracy at each step), and still has no overflow risk. In practice, on my TI-84+, I get full 14-digit precision in 2 iterations. The next iteration should theoretically get 56 digits, and each iteration requires 2 divisions, 1 inverse, and 1 multiplication (plus a shift and a few adds). Here is an example to show how it works:

Say we wish to find the geometric mean of two numbers, x=209, y=654321. First, for notation, I am just going to refer to the geometric mean as {x,y}, so {209,654321}. The first step is to find the first 'n' so that 4n*x is bigger than y, so in this case, n=6.
Then {x2n,y2-n}={x,y}, but now the numbers are much "closer." We get {13376,10223.765625}. Now factor out the largest power of 2 that fits in both numbers, which is 8192. We have 8192{1.6328125,1.248018265}.

All of that preemptive stuff has set up a new {x,y} where we have y<x<2y, so 1<x/y<2, or 1>y/x>.5. This will allow for some fine-tuned magic in a bit. What we should realize is that if we can get x=y somehow, then we have {x,y}={r,r}=##\sqrt(r^{2})=r##. So notice that if we choose some number ##a##, we have ##{xa,y/a}=\sqrt{xay/a}=\sqrt{xy}={x,y}##. So what we need to do is find some ##a## such that xa=y/a, so xa2=y, ##a=\sqrt{y/x}##. However, we want to avoid computing square roots in our computation of square roots, so what we can do is make an estimate for a. As luck would have it, based on our restrictions above, we can make a pretty decent estimate of the square root just by averaging the quotient with 1. In this case, that is ##\frac{\frac{1.248018265}{1.6328125}+1}{2}\approx .8821682725##. So then we get 8192{1.440415382,1.414716788}. Iterating:
8192{1.427566085,1.427450431}
8192{1.427508258,1.427508256}
8192{1.427508257,1.427508257}

And for safety, we could do one more iteration and average the two numbers in case of extra hidden precision, but we get the final result 11694.147638883, which is pretty close to 11694.147638883.

But that required 5 iterations requiring 2 divisions and 1 multiplication each. This is not the promised 2 iterations! So here is where we do some magic. At the division step for y/x, we want to find the square root, right? Why not try a couple of iterations at this step? We would get {1,y/x} as the square root, our estimate is still ##\frac{\frac{y}{x}+1}{2}##, so if we apply this, we get ##{\frac{\frac{y}{x}+1}{2},y/x\frac{2}{\frac{y}{x}+1}}\approx \frac{\frac{y}{x}+1}{4}+\frac{y/x}{\frac{y}{x}+1}##. So in pseudo-code, we could do:
Code: [Select]
1+y/x→a
1-1/a+a>>2→a
{xa,y/a}→{x,y}
This is the same as performing two iterations at the cost of 3 divisions, 1 multiplication, compared to 4 divisions, 2 multiplications. So we save 1 division, 1 multiplication.



So, for some working TI-BASIC code:
Code: [Select]
{max(Ans),min(Ans→L1
int(log(L1(1)/L1(2))/log(4
L1{2^-Ans,2^Ans→L1
2^int(log(Ans(1))/log(2→P
L1/Ans→L1
While E-10<abs(max(deltaList(L1    ;engineering E
1+L1(2)/L1(1
1-Ans^-1+.25Ans
L1{Ans,Ans^-1→L1
End
Pmean(L1

528
Math and Science / My Newest Trick For Division
« on: March 25, 2014, 02:03:43 pm »
Okay, so yesterday, I was thinking about the algorithms I use for computing logs and exponentials with my floating point numbers and I realized that I had the core of a fast division algorithm right there. Then I thought, "oh wait this is just Goldschmidt Division." I was wrong though, now that I am more awake and looking at the algorithm. The biggest difference is that mine works on the same time complexity as long-division so I need 64 iterations for my 64 bits of precision versus 5.

But there is a but. Those 5 iterations would require 2 full precision 64-bit multiplications, which I currently have down to about 9000 t-states each. My method would require 3*32 shifts+subtracts. On the surface, it looks like it would require 32 more, but I have tricks and magic on my side.

So how does it work? I decided to look at x/y as a vector {x,y} where I try to drive y towards 1 (I have been doing a lot of CORDIC stuff lately). The speed comes from how I drive y towards 1. In it's normal state, I have mantissas as 1.xxxxx, so x and y can be treated as numbers on [1,2). Now as a quick observation, if y>=4/3, then y*3/4 is less than y, but greater than or equal to 1. However, y*3/4=y-y/4 = y-(y>>2) where ">>2" refers to a right shift of 2. Similarly, I compare to 8/7, 16/15, et cetera.

That comparison step can get annoying, especially if you are using a lookup table of 64-bit numbers. It can also get slow. What I did was look at the binary expansions of these numbers:
4/3 = 1.01010101...
8/7 = 1.001001001...
16/15 = 1.000100010001...
32/31 = 1.000010000100001...

See the pattern? Basically, I can count the leading zeroes after the decimal point and if the result is 63, I can stop (denominator is 1) else if it is n, the denominator is guaranteed to be on ##[\frac{2^{n+2}}{2^{n+2}-1},\frac{2^{n+1}}{2^{n+1}-1}]##. From this, a quick pseudo-code example would be:
Code: [Select]
Label:
   2+clz(y)→a
   if a>m : Return
   x-x>>a→x
   y-y>>a→y
The code assumes clz will count the leading 0s after the first digit (which is always 1) and m is the desired accuracy. A speed up can occur by noticing that if y=1.000...001xxxxx in binary with n 0s, then y-y>>(n+1) isn't going to change the next sequence of n bits after the 1 (because you are subtracting 0s from xs). So if you have gone up to m bits, the next m bits are basically computed for free in y. This leads to a 2-part algorithm:
Code: [Select]
Label1:
   2+clz(y)→a
   if a>m : Goto Label2
   x-x>>a→x
   y-y>>a→y
Label2:
For n from m+1 to 2m-1
   if bit n of y is 1: x-x>>n→x
Adding that extra bit of code gets 2M bits of precision.

I just thought I would share :) This sucks on the Z80 because there isn't a CLZ instruction (but this can be simulated at decent speed), but those arbitrary shifts are what can't kills it.

529
ASM / Re: [z80] Floating Point Routines
« on: March 19, 2014, 01:11:59 pm »
I have a new version for the 80-bit floats and I made an example routine to compute the square root using Newton's method (basically .5(Ans+X/Ans) 5 times). Since there isn't already a square root routine and this gets about 64-bits of precision, I have left it in, but it is slow.


Attached is the current source, and it is updated in the Google Docs thing. For a summary of timings:
Code: [Select]

clock cycles ops per sec, 6MHz
Add/Sub   1200 cc 5000
Multiplication 13000 cc 461
Division 19000 cc 315
Sqrt 108000 cc 55

530
ASM / Re: [z80] Floating Point Routines
« on: March 19, 2014, 07:40:17 am »
At a hefty 1631 bytes and a change of syntax, I have managed to get division under 20 000 t-states and multiplication at about 13000 t-states.


The change of syntax is basically using the exponent as 16384 for 0, sign is bit 15, and these together comprise the first two bytes in little-endian. The following 8 bytes are the mantissa and are also little-endian (it was big-endian before).


I also changed how the multiplication and division routines are done. For the division at each step, instead of doing 8 8x8->16 multiplications, it is faster to do 4 8x16->24 multiplications. For the multiplication, instead of doing 16 16x16->32 multiplications, I do 32 8x16->24 multiplications.


I should also note that I have these timings without code to check for overflow, zero, infinities, or NAN, but I changed exponent storage to make these easier to do. Now if I can get addition and subtraction in this new format, I will be tempted to write some really slow routines for exponentials, logs, and trig functions. Then they will be available until I have time to actually design much faster algorithms.


Also, I have been trying to think of a way to make a "geometric mean" function. Basically, the geometric mean of 'a' and 'b' is sqrt(ab). I could just multiply and perform the square root, but for the past few months I have been having the idea of fusing the two operations into one. The idea would be that since an NxN multiplication would return a number of size 2N, then I can try to get 2 bits at a time of the multiplication. For restoring squares square root, it takes in the upper two bits at each iteration, so I can feed the bits directly from the multiplication into the square root algorithm without worrying about storing a 128-bit integer, and while getting full precision to 64 bits (instead of multiplying to get a 64-bit approximation and taking the square root of that).


The geometric mean would be useful when combined with the arithmetic mean (a+b)/2, especially if I can get it to be streamlined and fast. There is a class of algorithms that use the Arithmetic-Geometric Mean. This provides the asymptotically fastest known method of computing things like natural log, arctangent, and others.

531
ASM / Re: [z80] Floating Point Routines
« on: March 18, 2014, 10:45:35 am »
Cool, thanks :) I finally got all of the small bugs out and it currently takes 25053 t-states to perform pi/e and 22483 t-states for e/pi. However, I have the number format slightly different from my other 80-bit floats. After experimenting earlier this year, it is actually a little friendlier to store the exponent+sign in little-endian mode and have 16384 represent the exponent 0.

Floating point is basically like scientific notation. So in base 10, 314.1593=3.141593*102. The way it would be stored in floating point is a sign bit (positive, so usually 0), followed by the exponent (2), followed by the string "3141593". Now if we want to square this, we can multiply 3141593*3141593 and add the exponents, and XOR the sign. So you get +:4:9869606577649 and this tells us that the number is +98696.06577649. In practice, we truncate and round the digits, so to 6 digits, +:4:986961.

In binary, it is the same idea. If I keep the numbers normalized (keeping the mantissa on [1,2)), then division of 3/5 for example, is dividing +:1:11000000 / +:2:10100000. We can subtract exponents, XOR the signs, and then we perform 11000000/10100000 to whatever precision we need. Integer division would give us 0, but if we go for 8 more iterations of the same division algorithm, we would get 8 bits of the decimal. Rounding, we get +:-1:10010100 which corresponds to .10010100 = 154/256 = .6015625.

Here is my code, currently. I am pretty sure it isn't nearly as optimized as it could be and it is nearly 900 bytes:
Code: [Select]
Div_Sub:
;DE/C, DE <C*256, C>127
   ld a,d
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ rla \ jr c,$+5 \ cp c \ jr c,$+4 \ sub c \ inc e
   sla e \ adc a,a \ jr c,$+5 \ ret p \ cp c \ ret c \ inc e \ ret
FloatDiv_80:
; 1 bit sign + 15 bits signed exponent (16384 is exp = 0)   (little endian)
; 64 bits mantissa, (big endian)
;Inputs:
;  HL points to dividend
;  DE points to divisor
   ex de,hl
   call LoadFPOPs
   ld hl,(fpOP1)
   ld de,(fpOP2)
   ld a,h
   xor d
   push af
   res 7,d
   res 7,h
   sbc hl,de
   ld bc,16384
   add hl,bc
   pop af
   and $80
   or h
   ld h,a
   ld (fpOP3),hl
;Now perform the division of fpOP2/fpOP1
;The algo works like this:
;  Take the first byte of fpOP2, compare against that of fpOP1
;  If it is bigger, since fpOP1 should have bit 7 set (normalized numbers),
;    it divides at most once. So the first byte is 1, subtract fpOP2-fpOP1->fpOP2
;  After this, we repeatedly compare the upper two bytes of fpOP1 to the first byte
;    of fpOP1. This is to estimate how many times fpOP1 can be divided by fpOP1.
;  This is just a guestimate, but each digit is an overestimate by at most 1!
;
;  Example with smaller numbers. Take 8AD176/AC0980
;   First digit = 0   ('digits' are 8-bit ints, so on [0,255])
;   Now 8AD1/AC = CE, so 8AD176.00 - AC0980*0.CE    = 8AD176-8A6FAF = 61D1
;   Now 61D1/AC = 91, so 61D1.0000 - AC0980*.0091   = 61D1.0-6171.6180 = 5F.9E80
;   Now 5F9E/AC = 8E, so 5F.9E80   - AC0980*.00008E = 5F.9E8000-5F.6D4500 = .313B
;  In this case, there were no over estimates. We would have know if the subtraction step
;  yeilded a negative output. To adjust this, decrement the new digit by 1 and add AC0980 to the int.
;  So the example gives 8AD176/AC0980 = 0.CE918E, or in base 10, 9097590/11274624=.806908488274

;fpOP1+2 has denom
;fpOP2+2 has num
   ld de,fpOP2-2
   ld hl,fpOP2+2
   ldi \ ldi \ ldi
   ldi \ ldi \ ldi
   ldi \ ldi \ ldi
   ldi \ ldi \ ldi
denom = fpOP1+2
numer = fpOP2-2
outp  = numer-1
   ld hl,denom
   ld de,numer
   call cp_64b
   ld hl,numer-1
   ld (hl),0
   jr c,noadjust
      inc (hl)
      ex de,hl
      inc de
      ld hl,denom
      call sub_64b
      ex de,hl \ dec hl
noadjust:
   inc hl
   ld de,numer+8
   call div_sub_1
   call div_sub_1
   call div_sub_1
   call div_sub_1
   call div_sub_1
   call div_sub_1
   call div_sub_1
   call div_sub_1
   ld de,801Eh
   ld hl,800Bh
   ld a,(hl)
   rra
   jr nc,directcopy
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ inc de
      inc hl \ ld a,(hl) \ rra \ ld (de),a \ ret
directcopy:
   inc hl
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ld hl,(fpOP3) \ dec hl \ ld (fpOP3),hl \ ret
div_sub_1:
   ld bc,(denom)
   ld a,(hl)
   inc hl
   push hl
   ld l,(hl)
   ld h,a
   ex de,hl
   call Div_Sub
   ld c,e
   ex de,hl
   call fused_mul_sub
   ld hl,9
   add hl,de
   ex de,hl
   pop hl
   ret
fused_mul_sub:
;multiply denominator*E and subtract from numerator
   xor a
   ld hl,(denom+6) \ ld b,a \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   ld a,(de) \ sub l \ ld (de),a \ dec de
   ld a,h \ adc a,b

   ld hl,(denom+5) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom+4) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom+3) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom+2) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom+1) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ adc a,b \ dec de

   ld hl,(denom-1) \ ld l,b
   sla h \ jr nc,$+3 \ ld l,c
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add hl,hl \ jr nc,$+3 \ add hl,bc
   add a,l \ jr nc,$+3 \ inc h \ ld l,a
   ld a,(de) \ sub l \ ld (de),a \ ld a,h \ dec de
   ld l,a
   ld a,(de)
   sbc a,l
;if c flag is set, overestimate
   ld a,c \ ld (de),a
   ret nc
   ld hl,8
   add hl,de
   ex de,hl
   ld hl,denom+7
   ld a,(de) \ add a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec hl \ dec de
   ld a,(de) \ adc a,(hl) \ ld (de),a \ dec de
   ex de,hl \ dec (hl) \ ex de,hl
   ret



;num+7 - hl

sub_64b:
;(de)-(hl), big endian 64-bit.
   ld bc,7
   add hl,bc
   ex de,hl
   add hl,bc
   ex de,hl
   ld a,(de) \ sub (hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ dec de \ dec hl
   ld a,(de) \ sbc a,(hl) \ ld (de),a \ ret
cp_64b:
;compares (de) to (hl), big endian 64-bit ints
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret nz \ inc de \ inc hl
   ld a,(de) \ cp (hl) \ ret

LoadFPOPs:
;HL points to the first
;DE points to the second
   push de
   ld de,fpOP1
   xor a
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ld (de),a \ inc de
   ld (de),a \ inc de
   ld (de),a \ inc de
   ld (de),a \ inc de
   pop hl
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ldi
   ld (de),a \ inc de
   ld (de),a \ inc de
   ld (de),a \ inc de
   ld (de),a \ inc de
   ret
.echo "Size:",$-Div_Sub
It is so large because I unrolled much of it. As an example, using e and pi:
Code: [Select]
   ld hl,float_e
   ld de,float_pi
   jp FloatDiv_80
;e/pi =0.dd816a76547ca9910802972996d4e3
float_pi:
  .dw 16384+1 \ .db $c9,$0f,$da,$a2,$21,$68,$c2,$34    ;pi, not rounded up
float_e:
  .dw 16384+1 \  .db $ad,$f8,$54,$58,$a2,$bb,$4a,$9A    ;e, not rounded up

532
TI Z80 / Re: SpriteFont
« on: March 17, 2014, 09:53:32 am »
Yes, it was a lot easier than I expected (changing two byte).

533
TI Z80 / SpriteFont
« on: March 17, 2014, 09:38:44 am »
I threw this together after Art_of_camelot mentioned that it could be useful. It is a lightweight font hook program that does the following:
  • During program execution, you can change the homescreen font.
  • Upon exiting, the previous font hook is restored
Fonts must be appvars without a header (so Omnicalc fonts won't work, for example), but they can be archived or in RAM. The "commands" are simple. A string in Ans contains the appvar name. Lowercase won't work very easily, as I don't include a tokens→ASCII routine, so use uppercase letters and numbers. For example:
Code: [Select]
"FONT":Asm(prgmSPRTFNT
If you want to swap fonts put a plus sign in front:
Code: [Select]

"+FONT":Asm(prgmSPRTFNT
If you want to disable the font hook, either pass a non-string or "-":
Code: [Select]
"-":Asm(prgmSPRTFNT


Notes- If you don't use "+" to swap fonts, the hook won't be able to load in the default font after exiting.
If you use the assembly program to allow you to use Delvar / Archive / Unarchive on a program, displaying text before setting the mode back to "program mode" will cause the font to be disabled.


EDIT: And the code:
Code: [Select]

#include    "ti83plus.inc"
#define     progStart   $9D95
.org        progStart-2
.db         $BB,$6D
name = n-SpriteFont+9872h
adjust = a-SpriteFont+9872h


SetUpFont:
bcall(_RclAns)
sub 4
jr z,Parse
deacthook1:
call Is_Hook_Active
ret nz
jp deacthook-SpriteFont+9872h
Parse:
ex de,hl
ld c,(hl)
inc hl
ld b,(hl)
inc hl
ld a,(hl)
cp 71h
jr z,deacthook1
cp 70h ;+ sign
jr nz,LoadHook
call Is_Hook_Active
inc hl
jr nz,LoadHook
ld de,name+1
ldir
xor a
ld (de),a
ret
LoadHook:
push hl
    push bc
ld hl,(fontHookPtr)
ld a,(fontHookPtr+2)
ld (prevhook),hl
ld (prevpage),a


ld hl,SpriteFont
ld de,9872h ;appbackupscreen
ld bc,SpriteFontEnd-SpriteFont
ldir
    pop bc
pop hl
ld de,name+1
ldir
xor a
ld (de),a




ld hl,9872h
ld a,l
bcall(4FE4h)         ;Sets the font hook
ret
Is_Hook_Active:
ld a,(9872h)
sub 83h
ret nz
bit 5,(iy+35h)
ret nz
push hl
ld hl,(fonthookPtr)
ld de,9872h
sbc hl,de
pop hl
ret nz
SpriteFont:
.db 83h
dec a
jr z,$+5
exithook:
xor a
inc a
ret
bit 1,(iy+8)
jr nz,locate_replace
    push hl
deacthook:
.db 21h
prevhook:
.dw 0
.db 3Eh
prevpage:
.db 0
bcall(4FE4h)         ;Sets the font hook
    pop hl
jr exithook
locate_replace:


;B is the char
ld (restoreBC-SpriteFont+9872h),bc
ld l,b
ld h,a
in a,(6)
ld (restorePage-SpriteFont+9872h),a
ld c,l
ld b,h
add hl,hl
add hl,bc
add hl,hl
add hl,bc
push hl
ld hl,name
rst 20h
bcall(_ChkFindSym)
pop hl
jr c,deacthook
ex de,hl
inc hl
inc hl
add hl,de
ld a,b
or a
jr z,ram_font+2
ld b,0
add hl,bc
ld c,10
add hl,bc
jp p,ram_font-SpriteFont+9872h
inc a \ ld h,40h
ram_font:
out (6),a
ld de,lFont_record
ld b,7


ld a,(hl) \ ld (de),a
inc e \ inc l
call z,adjust
djnz $-7


ld hl,lFont_record
.db 3Eh
restorePage:
.db 0
out (6),a
.db 1
restoreBC:
.dw 0
xor a
ret
a:
inc h
ret po
in a,(6)
inc a
out (6),a
ld h,40h
ret
n:
.db 15h,0,0,0,0,0,0,0,0
SpriteFontEnd:

534
ASM / Re: [z80] Floating Point Routines
« on: March 16, 2014, 01:40:43 pm »
A while ago, I thought of a way to possibly make division faster, but I never had the time or focus to actually write it out in assembly. I was working on it today after satisfying myself with the math involved and it doesn't work yet (there is a bug somewhere that seems to be incrementing the wrong byte in the result on occasion). However, the fix for that should not make it much slower. My goal was to get division below 45 000 t-states and from current tests, it will probably be more like 25 000 t-states. This is nearly twice as fast as the OS division and a little faster than the OS' multiplication and this is for 19 digits of accuracy.

My worry when I thought of the algorithm was that it seemed like it would be pretty close to the speed of long division and might only be faster for very large or small sized numbers. It seems to be that on the z80, this method blows away long division when using a floating point storage format.

Here is the way it works-- If my numbers are always normalised (this means the upper bit is always set, unless the number is 0), then I might want something like the following in hexadecimal 8AD176/AC0980 (in base 10, 9097590/11274624). What I do is I make an estimate for the first byte. Normally on the z80, we look at the bits because handling too many bytes is difficult. However, my algorithm isn't quite long division. The speed up comes from how we choose the first chunk of 8 bits. I just take the upper 16 bits of the numerator, upper 8-bits of the denominator, and divide those to get my estimate. The nice part is that this will only be off by as much as 1 (and if it is, it was an overestimate). Then whatever the estimate was, multiply that by the denominator (like in base 10) and subtract. So for the example:
Code: [Select]
First digit = 0 ('digits' are 8-bit ints, so on [0,255])
Now 8AD1/AC = CE, so 8AD176.00 - AC0980*0.CE    = 8AD176-8A6FAF = 61D1
Now 61D1/AC = 91, so 61D1.0000 - AC0980*.0091   = 61D1.0-6171.6180 = 5F.9E80
Now 5F9E/AC = 8E, so 5F.9E80   - AC0980*.00008E = 5F.9E8000-5F.6D4500 = .313B
In this case, there were no over estimates. We would have know if the subtraction step yeilded a negative output. To adjust this, decrement the new digit by 1 and add AC0980 to the int. So the example gives 8AD176/AC0980 = 0.CE918E, or in base 10, 9097590/11274624=.806908488274

That multiplication step is just multiplying an 8-bit number by a 64-bit number in my routine, and I combine this with subtraction (so it is a fused multiply-subtract routine) and it only needs to be performed 8 times. Hopefully when I get out of work later, I won't be too tired to keep working on this :)

535
TI Z80 / Re: Factorials in TI-BASIC
« on: March 15, 2014, 01:06:42 pm »
I once had it on the graphscreen. I ended up putting it on the homescreen letting the user scroll through it a few lines at a time.

536
Math and Science / Computing Logarithms
« on: March 06, 2014, 12:45:55 pm »
One of the cooler ways of computing logarithms that I know of generates 1 bit per iteration using a look up table and add/shift operations in programming-terms. It works using some simple rules about logs, in particular:
##log(xy)=log(x)+log(y)##



So for example:
##log(x)=log(x\frac{3}{4}\frac{4}{3})=log(x\frac{3}{4})+log(\frac{4}{3})##


log(1)=0 for any base, so if we multiply x by enough stuff to get it close to 1, we end up with 0+stuff and "stuff" can be in a precomputed table. Multiplying by 3/4, for example, is easy. Division by a power of 2 can be done with bit shifts, so to translate to code, x-x>>2. If we only multiply by ##\frac{2^{n}-1}{2^{n}}## then the process is an add and shift, and we just need to add ##log(\frac{2^{n}}{2^{n}-1})## to the accumulator. Let's do some practice to find log(23):


First, get 23 into the range [1,2). If we divide by 24, then we get 1.4375, so:
##log(23)=4log(2)+log(1.4375)##
Since 1.4375>4/3, if we multiply by 3/4, we will get a smaller number that is still bigger than 1. So:
##4log(2)+log(1.078125)+log(4/3)##
In turn, 1.078125<8/7, so we skip that.
1.078125>16/15, 1.078125*15/16=1.0107421875
##4log(2)+log(1.0107421875)+log(4/3)+log(16/15)##

Continuing this process up to 1024/1023 for 3 digits of accuracy, we get:

##4log(2)+log(1.002845764...)+log(4/3)+log(16/15)+log(128/127)+log(512/511)##
##log(23)\approx 4log(2)+log(4/3)+log(16/15)+log(128/127)+log(512/511)##


For comparison, ##log_{10}(23)\approx1.361727836##, and  ##4log_{10}(2)+log_{10}(4/3)+log_{10}(16/15)+log_{10}(128/127)+log_{10}(512/511)\approx 1.361342752##. We only wanted 3 digits of accuracy, but this is actually accurate to within 1/2596.

This is fast, but can we make it faster? (hint: You know the answer to this.)


Basically, we are comparing to 4/3, 4/3 (yes, twice), 8/7, 16/15, ... but if we are computing logs in a program, especially at the assembly level, we will be working with bits. Allow me to rewrite these numbers in binary:


4/3=1.0101010101...
8/7=1.0010010010...
16/15=1.000100010001...


Do you see the pattern? On top of that, we can really just count the leading zeroes after the decimal. If there are 5, we can multiply by 63/64  and at the worst case, it will be an overestimate by 1 shift (for example, 1.000001 is not bigger than 63/64, so we would actually need to x*127/128 = x-x>>7 instead of x-x>>6). This lets us skip any unnecessary steps for comparing and we don;t need an LUT of these values. As a new example, lets compute log(1.890625). In binary, this is log(1.111001). There are no leading zeroes, so multiply by 3/4:
1.111001-.01111001=1.01101011
[Aside: this is a case where we have to do an initial, multiply by 3/4 and do it again]


Now we jump into the main iteration. There is 1 leading 0, so multiplying by 3/4, we still get a number bigger than 1:
1.01101011-.0101101011=1.0001000001
Now we have 3 leading zeroes, but it happens to be smaller than 16/15 (notice it is "3 zeros, 1, 5zeroes, 1") so we multiply by 31/32:
1.0001000001
-.000010001000001
--------------------------
1.000001111011111

Now there are 5 leading zeroes, so we will stop here and estimate log(4/3)+log(4/3)+log(32/31)+log(64/63) which is accurate to two decimal places.


[/font]
I promised faster, though. That was an improvement, but not by as much as the following. Notice that multiplying by 15/16 is the same as shifting x 4 times to the right and then subtracting that from x. Since x is on [1,2), that means there is always 3 leading zeroes after the decimal followed by a 1 and everything else. We also know the first 3 digits have to be zeroes, to, so after the 1, we have 3 zeroes, so it looks like this:
Code: [Select]
1.0001abcdefgh
-.0001000abcdefgh
But if we subtract them, we see that we will get about 1.0000abc.... The abc are left alone if d=1 (no carry from subtraction) otherwise the bit propagates. Still, if a is set, we can expect that the next iteration is going to require multiplying by 31/32, but that won't likely change b or c, so then the next iteration depends on what b and c are.

Basically, what I am saying is this: If we have the number log(1.01abcdefg), we would multiply by 3/4 and if we stop here, we could get a better approximation by adding using a*log(8/7). If we stop at log(1.000001abcdefgh), though, there are 4 leading zeroes, so we can add to the end of this for a better approximation, a*log(128/127)+b*log(256/255)+c*log(512/511)+d*log(1024/1023)+e*log(2048/2047). Going back to an example in the previous section, we had x=1.000001111011111 with the approximation log(4/3)+log(4/3)+log(32/31)+log(64/63). If we want to guess and get pretty close to double the accuracy, we can add 1*log(128/127)+1*log(256/255)+1*log(512/511)+0*log(1024/1023)+1*log(2048/2047). Indeed, we get 4 digits of added accuracy (.2766723863, estimated vs. .2766053963... actual).


We can apply this recursively to double the digits of accuracy at each iteration, but this usually overestimates, especially early on. This will be easy to detect and correct when you subtract x-x>>n→x for each bit, so you can adjust there. Ideally, this is useful to try to squeeze as much extra precision as you can in the last iteration. If I wanted to write a logarithm routine to 64 bits of accuracy, I could wait until it got to 32-bits or more completed, and then I would have a pretty safe estimate for at least the next 31 bits and most likely the 64th bit and this would be without all the computation of x-x>>n.




537
Site Feedback and Questions / Re: Omni Feature Request
« on: March 05, 2014, 03:52:27 pm »
This will be awesome in case I get into a tutorializing mood, so thanks much for considering it!

538
Site Feedback and Questions / Omni Feature Request
« on: March 03, 2014, 09:51:19 am »
Hi. Some of you might not know me, but I was wondering if we could have something like [math][/math] / [itex][/itex] / [latex][/latex]-like tags. They make equations look a lot nicer than "237/101-4sqrt(2)/(x+sqrt(2))".




But really, I do like to post in our Math section and some formulas start looking pretty ugly in linear format.

539
Math and Science / Re: nth derivative
« on: March 03, 2014, 08:22:41 am »
That is probably the easiest method. Plus, for the Taylor Series, you need the previous terms anyways.

540
Official Contest / Re: TI-Concours 2014
« on: March 01, 2014, 08:39:10 pm »
My factorial program takes about 30 seconds to compute 135! and that's before displaying it :/ Also, it actually can compute up to 3248!, but it cannot display it, yet. That would take a really long time.

Pages: 1 ... 34 35 [36] 37 38 ... 317