Author Topic: [z80] Floating Point Routines (Read 59449 times)

Xeda112358 · « **Reply #45 on:** January 05, 2019, 12:58:36 pm »

As an update the B-G algorithm is implemented, as well as inverse trig and inverse hyperbolic functions, and natural logarithm. There have been a bunch of bug fixes, and optimizations.

My initial implementation of the 64-bit square root had an issue that was too daunting to track down. Instead, I just made a quick patch to make it work, requiring a 32-bit square operation. I finally decided to rework the code a bit and fixed the issue, allowing me to get rid of that 32-bit square and replace it with a 16-bit square (as originally intended).

xsqrt is now under 6600 clock cycles in average !

EDIT:
I made some more accurate timings, including taking into account the time it takes to make a bcall.

Code: [Select]

          TI-OS       z80float  dif         %
ln    = 131547.46cc   ~165000   +33452.54   125.43%   much slower :(
atan  = 173317.82cc   ~174000   +682.18     100.39%   slightly slower
atanh = 175320.91cc   ~174000   -1320.91     99.25%   slightly faster
sqrt  =  77699.51cc   6540.79   -71158.72     8.42%   Way faster!
mul   =  30229.53cc   9928.23   -20301.30    32.84%   over 3 times faster
add   =   1737.99cc   2094.31   +356.32     120.50%   slower :(

Also, have a recent screenshot:

Keep in mind that I'm displaying one extra digit beyond the accuracy of these floats, so the last digit is useless. In the next digit, error is off by less than 2.0 !

Xeda112358 · « **Reply #46 on:** January 14, 2019, 02:40:17 pm »

x-post

I have been focusing on the single-precision floats this past week or so. I rewrote or re-worked a lot of routines. I got rid of most of the tables by switching to a polynomial approximation for the 2^x routine (thanks to the Sollya program!) and using the B-G algorithm to compute lnSingle. It turned out to be faster this way, anyways.

I implemented sine, cosine, and tangent, the first two, again, using minimax polynomial approximation. I optimized the square-root routine (much faster but a few bytes bigger). I re-implemented the B-G algorithm using math optimizations I came up with a few months ago. I opted for two B-G implementations-- one for lnSingle which requires only 1 iteration for single precision, and one for the inverse trig and hyperbolic functions which needs 2 iterations. For anybody looking to save on size, you can just use the second B-G routine for natural logarithm. It will be a little slower, but it'll work just fine (maybe even give you an extra half-bit of precision

).

I included the Python program that I use for converting numbers to my single precision format. You can use it to convert a single float or a bunch of them. I also included a Python tool I made for computing more efficient coefficients in the B-G algorithm, but that'll only be useful to me and maybe a handful of other people. It's there on the off chance somebody stumbles across my project looking for a B-G implementation.

The single precision floats are largely complete in that I can't think of any other functions that I want to add. There is still work to be done on range reduction and verification, as well as bug fixes and more extensive testing.

Here is a current screenshot of some of the routines and their outputs:

The current list of single-precision routines:

Code: [Select]

Basic arithmetic:
  absSingle     |x| -> z       Computes the absolute value
  addSingle     x+y -> z
  ameanSingle   (x+y)/2 -> z.  Arithmetic mean of two numbers.
  cmpSingle     cmp(x,y)       Compare two numbers. Output is in the flags register!
  rsubSingle    y-x -> z
  subSingle     x-y -> z
  divSingle     x/y -> z
  invSingle     1/x -> z
  mulSingle     x*y -> z
  negSingle     -x  -> z
  sqrtSingle    sqrt(x*y) -> z
  geomeanSingle sqrt(x*y) -> z

Logs, Exponentials, Powers
  expSingle    e^x -> z
  pow2Single   2^x -> z
  pow10Single  10^x-> z
  powSingle    y^x -> z
  lgSingle     log2(x)  -> z
  lnSingle     ln(x)    -> z
  log10Single  log10(x) -> z
  logSingle    log_y(x) -> z

Trig, Hyperbolic, and their Inverses
  acoshSingle   acosh(x) -> z
  acosSingle    acos(x)  -> z
  asinhSingle   asinh(x) -> z
  asinSingle    asin(x)  -> z
  atanhSingle   atanh(x) -> z
  atanSingle    atan(x)  -> z
  coshSingle    cosh(x)  -> z
  cosSingle     cos(x)   -> z
  sinhSingle    sinh(x)  -> z
  sinSingle     sin(x)   -> z
  tanhSingle    tanh(x)  -> z
  tanSingle     tan(x)   -> z

Special-Purpose    Used by various internal functions, or optimized for special cases
  bg2iSingle     1/BG(x,y) -> z   Fewer iterations, but enough to be suitable for ln(x). Kind of a special-purpose routine
  bgiSingle      1/BG(x,y) -> z   More iterations, general-purpose, needed for the inverse trig and hyperbolics
  div255Single   x/255 -> z
  div85Single    x/85  -> z
  div51Single    x/51  -> z
  div17Single    x/17  -> z
  div15Single    x/15  -> z
  div5Single     x/5   -> z
  div3Single     x/3   -> z
  mul10Single    x*10  -> z
  mulSingle_p375         x*0.375  -> z      Used in bg2iSingle.  x*(3/8)
  mulSingle_p34375       x*0.34375-> z      Used in bgiSingle.   x*(11/32)
  mulSingle_p041015625   x*0.041015625-> z  Used in bgiSingle.   x*(21/512)

Miscellaneous and Utility
  randSingle    rand   -> z
  single2str    str(x) -> z           Convert a single to a null-terminated string, with formatting
  single2TI     tifloat(x) -> z       Converts a single to a TI-float. Useful for interacting with the TI-OS
  ti2single     single(tifloat x)->z  Converts a TI-float to a single. Useful for interacting with the TI-OS
  single2char   Honestly, I forgot what it does, but I use it in some string routines. probably converts to a uint8
  pushpop       pushes the main registers to the stack and sets up a routine so that when your code exits, it restores registers. Replaces manually surrounding code with push...pop

Xeda112358 · « **Reply #47 on:** January 22, 2019, 04:38:28 pm »

Update:
For the extended-precision floats, I added:

Code: [Select]

xcmp     for comparing two numbers
xneg     -x -> z
xabs     |x|-> z
xinv     1/x -> z   Observed a bug in 1/pi !
xpow     x^y -> z
xpow2    2^x
xpow10   10^x
xlog     log_y(x)   It's failing miserably
xlg      log2(x)
xlog10   log10(x)   Observed a bug in log10(pi)

I made the str->single routine better (it had been quickly thrown together and failed on many/most cases). Now it appears that digits get swapped in some cases!

I have to look into this.

To Do:
Look into the string->single routine and figure out what is wrong
I still have to look into the bugs observed in the single-precision Mandelbrot set program
Look into the errors in xinv, xlog, and xlog10 (these might all be related, or maybe I accidentally a byte in the built-in constants).
Have to make xsin, xcos, xtan, xsinh, xcosh, xtanh, xtoTI (x-float to TI float), TItox (TI float to x-float), and strtox (string --> x-float).
For all of the trig routines, I still need to apply range-reduction :|

Once these are done, it's just finding and fixing bugs and optimizing, and the project is as complete as what I wanted to do. BCD floats were a cute idea, but I'm a bit more realistic now

Xeda112358 · « **Reply #48 on:** February 21, 2020, 12:50:43 am »

Wow, it has been over a year since I last updated this thread. I followed up on that todo list a while ago, but I'm posting to announce a new addition to the library: 24-bit floats! I basically wrote all of these in the last 48 hours and I'm pretty burned-out:
https://github.com/Zeda/z80float/tree/master/f24

These floats are a fantastic balance of speed and usefulness on the Z80, and I think all of the routines fit into about 2500 bytes, so they are compact, too.

Xeda112358 · « **Reply #49 on:** May 06, 2021, 07:59:08 pm »

Wow, it has been over a year since I last updated this thread!

I've added support for IEEE-754 binary32 floats here!
What is really nice about this is that compilers like fasmg have built-in support for these floats, and there are a plethora of existing tools to work with them.
I've rearranged the project a bit and I am working on making the code more agnostic (so not relying on spasm's unique features). I've added a lot of conversion routines, especially for f32 and f24 floats.

I took an unplanned path with the f32 floats and ended up making addition, subtraction, multiplication, division, and square roots work entirely within registers and the stack, so no external RAM was required. I haven't taken the time to do a speed analysis of these, but I'm thinking they are a little slower than the single routines. I expect that multiplication and division are slower, and hopefully addition and subtraction are faster.

Author Topic: [z80] Floating Point Routines (Read 59449 times)

Xeda112358

Re: [z80] Floating Point Routines

Xeda112358

Re: [z80] Floating Point Routines

Xeda112358

Re: [z80] Floating Point Routines

Xeda112358

Re: [z80] Floating Point Routines

Xeda112358

Re: [z80] Floating Point Routines