Author Topic: [z80] Floating Point Routines  (Read 56125 times)

0 Members and 1 Guest are viewing this topic.

Offline Streetwalrus

  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3821
  • Rating: +80/-8
    • View Profile
Re: [z80] Floating Point Routines
« Reply #30 on: April 12, 2014, 05:48:16 pm »
No I think it goes beyond 16bits, and if it uses division then the decimal part is needed for accuracy. Gonna check that later and tell you more.

Offline TIfanx1999

  • ಠ_ಠ ( ͡° ͜ʖ ͡°)
  • CoT Emeritus
  • LV13 Extreme Addict (Next: 9001)
  • *
  • Posts: 6173
  • Rating: +191/-9
    • View Profile
Re: [z80] Floating Point Routines
« Reply #31 on: April 12, 2014, 07:17:54 pm »
Hmm, isn't 16-bit math sufficient, then? Also, in good news, I actually shaved off close to 18000 more clock cycles from the square root routine, putting it at a little over 3 times faster than TI's. I am back to working on the exponential and logarithm routine, but they are table based (using a single 64-element LUT with 9 bytes to each element). From this I will build the Float->Str routine.

Awesome Xeda! :D

Offline chickendude

  • LV8 Addict (Next: 1000)
  • ********
  • Posts: 817
  • Rating: +90/-1
  • Pro-Riot Squad
    • View Profile
Re: [z80] Floating Point Routines
« Reply #32 on: April 26, 2014, 04:59:12 pm »
Xeda, i was just looking through the 24-bit division routine and saw this line:
Code: [Select]
or a \ sbc hl,de \ jr c,$+7 \ set 7,b \ jp $+4 \ add hl,de \ srl d \ rr eI was just wondering if instead of all those "jp $+4"s, if you tried using "set 7,b \ .db $38 ;jr c,... \ add hl,de \ ; ..." you might be able to save two bytes and 3 t-states (x 15 repetitions). Since the carry will never be set there, it'll just skip the add hl,de which it will read as part of the jr. When the condition is false, jr is actually faster (and, of course, smaller) than a jp.
« Last Edit: April 26, 2014, 07:49:50 pm by chickendude »

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: [z80] Floating Point Routines
« Reply #33 on: April 26, 2014, 05:01:41 pm »
Xeda, i was just looking through the 24-bit division routine and saw this line:
Code: [Select]
or a \ sbc hl,de \ jr c,$+7 \ set 7,b \ jp $+4 \ add hl,de \ srl d \ rr eI was just wondering if instead of all those "jp $+4"s, if you tried using "set 7,b \ .db $56 ;jr c,... \ add hl,de \ ; ..." you might be able to save a byte and 3 t-states (x 15 repetitions). Since the carry will never be set there, it'll just skip the add hl,de which it will read as part of the jr. When the condition is false, jr is actually faster (and, of course, smaller) than a jp.

That sounds like a good idea. However, remember that jr c is $38 and not $56 ;)

Also, that actually saves two bytes, so even better.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline chickendude

  • LV8 Addict (Next: 1000)
  • ********
  • Posts: 817
  • Rating: +90/-1
  • Pro-Riot Squad
    • View Profile
Re: [z80] Floating Point Routines
« Reply #34 on: April 26, 2014, 07:49:07 pm »
Oops, i was looking at the decimal version  :-X Thanks for catching that (and that it actually saves two bytes), though i'm pretty sure it wouldn't have been an issue for Xeda ;)

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #35 on: April 30, 2014, 09:24:13 am »
Oh, that's a really good idea! I actually have to rewrite that division, though (it doesn't return accurate results because it isn't keeping track of all of the remainder term). But even so, i wonder how many more returns can use a similar optimization? Excellent!

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #36 on: November 24, 2015, 11:49:48 am »
Necro update:

I have some Single Precision Floating Point Routines that seem to be working. They have been tested, but not rigorously tested. The routines currently available seem to be working as expected including being able to compute and display ##\log_{2} \left(\pi^{-32}\right)## to 7 digits of accuracy. The single->string routine is not as complete as it could be, but it is working.

Syntax is consistent with HL pointing to the first argument, DE to a second argument if needed, and BC pointing to where the result is output. Only cmpSingle is different, because it does not store an output. Instead, it returns the result of the comparison in the flags register. With that exception, no float routines modify the main or shadow registers.

Now for an example of use on a TI-OS, let's compute and display ##\log_{2} \left(\pi^{-32}\right)##:
Code: [Select]
    ld hl,const_pi  ;pi is the first arg
    ld d,h \ ld e,l ;pi is also the second
    ld bc,scrap
    call mulSingle  ;pi*pi = pi^2
    ld h,b \ ld l,c ;Gonna square the result
    ld d,b \ ld e,c ;BC points to the result of the previous multiply, now HL and DE do, too.
    call mulSingle  ;= pi^4
    call mulSingle  ;= pi^8
    call mulSingle  ;= pi^16
    call mulSingle  ;= pi^32
    call invSingle  ;= 1/pi^32 = pi^-32
    call lgSingle   ;= lg(pi^-32)
    call single2string
    bcall(_PutS)

From the readme, included routines are:
Code: [Select]
absSingle
  func: |x| -> z
  mem:  None
addSingle
  func: x+y -> z
  mem:  6 bytes
  Note: special cases not done
subSingle
  func: x-y -> z
  mem:  10 bytes
  Note: special cases not done
rsubSingle
  func: -x+y -> z
  mem:  10 bytes
  Note: special cases not done
invSingle
  func: 1/x -> z
  mem:  5 bytes
divSingle
  func: x/y -> z
  mem:  5 bytes
cmpSingle
  func: compare x to y, no output
        return z flag if x=y (error is up to the last 2 bits)
        return c flag if x<y
        return nc if x>=y
  mem:  None
single2string
  func: string(x) -> z
  mem:  44 bytes
mulSingle
  func: x*y -> z
  mem:  6 bytes
negSingle
  func: -x -> z
  mem:  None

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #37 on: November 25, 2015, 01:33:01 pm »
I haven't update the public download, but I finished:
log10(x)
ln(x)
logy(x)
2x
ex
yx

And I'm working on making a much more general and better (asymptotically faster, less RAM used) routine for converting floats to base 10 strings (and TI floats).

Short Term Plan:
After the better float->string, I plan to implement a string->single and tifloat->single. These should be fairly easy to implement (I have done it several times before with success).
After that, a very simple math parser and I/O. This will be difficult-ish (6 on a scale of 1 to 10).

Longterm Schedule:
Update the exponential and log routine with a complex algorithm. Medium difficulty using the existing framework. This will supply most of the trig routines, too.
Add complex support to the math parser.
Extend these routines to the 80-bit floats.
Go nuts because now I can pump out tons of routines since I have the building blocks for most of math.



Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #38 on: November 26, 2015, 12:35:06 pm »
I have not added extra-precision calculation, but I really need to. The loss of accuracy gets built up in those bottom bits meaning a lose 1 decimal digit of accuracy (so we only have 6 digits of accuracy). TI uses 4 extra digits of precision in their intermediate calculations which is why they manage to keep much of the error out.

Anyways, I added a randSingle function, hyperbolic functions (sinh, cosh, tanh), a faked sqrt (using logs and exponents, until I find or rewrite lost code). I also overhauled the single2str function as planned, and I made a few other routines including a single->TI Float routine.

I've updated the public download, but I'll also upload to this post :)

Have a screenshot!

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #39 on: December 05, 2015, 06:55:20 pm »
Bugs fixed:
Spoiler For Special Formats:
Basically I had ZERO with an exponent of $FF and INF/NAN with exponents of $00. However, a calculation resulting in 'ZERO' was usually only indicated by an exponent shift from $01 to $00, and INF was indicated by a shift from $FF to $00 so I could rely on the z flag. The problem is, an addition could technically result in a value with exponent of $FF and not register as an overflow. Then when the result was fed into another routine, it would be flagged as ZERO instead of INF !

Now all special values have an exponent reading $00.
Spoiler For single2str:
there was an issue where when |x|<.1, the output string would be partially mangled. This was fixed. There also may have been an un-noticed off-by-one error when returning negative exponents, but I finally noticed and fixed it.
New and Updated Routines
Spoiler For str2Single:
Converts a TI-ASCII string into a float.
Spoiler For geomeanSingle:
Computes the geometric mean of two numbers.
New Bugs:
Spoiler For str2single:
This does not yet handle the engineering E.
This crashes on some inputs, possibly if the strings are too large.


Offline joijoi

  • LV0 Newcomer (Next: 5)
  • Posts: 1
  • Rating: +0/-0
    • View Profile
Re: [z80] Floating Point Routines
« Reply #40 on: December 24, 2015, 01:05:39 am »
Hello,
are you still working on the 80bits version of these routines?
Thanks in advances for any answer that you could give

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #41 on: December 28, 2015, 12:13:44 am »
At the moment, I am not working on the 80-bit versions. However, I am not finished working with them. Life is busy again, so it will probably be a while yet before I work on floating point routines again.

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #42 on: December 10, 2018, 11:45:55 pm »
As an update and backup to Omni, I have been working on some of the extended precision routines. At this moment, I've rewritten multiplication and square roots, and I am about to work on division.

The square root routine is now averaging 9183.443cc, a full three times faster than the previous implementation ! (and over 9 times faster than the TI-OS float algorithm).

The multiplication routine is now averaging 9925.527cc, a more humble 8.5% improvement. (still ~3.5 times faster than the TI-OS float multiplication). A good chunk of the speed improvement comes from a slightly faster 16-bit multiply routine from Runer112, which also has much nicer/more useful output registers.


I had an issue with the new square root algorithm, so for now I have patched it up so that it works, but it's about 1000cc slower than necessary. In the process of implementing this new algorithm, I rewrote some faster division routines and I might be able to get division down below 13000cc, a 30+% speed improvement (3 times faster than the OS routine).

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #43 on: December 12, 2018, 10:37:30 pm »
Okay, with a lot of discussion with Runer112 over IRC/Discord, and a lot of coding, here is an update!
First, it seems like there might be an accuracy issue with the lower bits of xdiv, but that could just be an issue from truncating the input.

xsqrt is averaging ~9183.443cc
xmul is averaging ~9925.527cc
xdiv is averaging ~11107.370cc

Comparing to the old set of routines from a few years ago, division is now almost exactly 40% faster, multiplication is still about 8.5% faster, and square roots are three times faster.

Comparing to the OS routines, multiply is about 3.59 times faster, divide is about 3.65 times faster, and square root is about 9.45 times faster.

Pretty much the one thing the OS has better is converting floats to strings, which the OS, using BCD floats, is wayyyyy faster for. I project that converting to a string could take on average 50000cc with my method, versus maybe 600cc for a TI float (assuming no special formatting).

EDIT: I'm still not good with GitHub, so I'm going to try to get this to work, but I make no promises. z80float

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [z80] Floating Point Routines
« Reply #44 on: December 15, 2018, 11:52:40 pm »
So I added in addition/subtraction and float->str. I haven't calculated timing for the add/sub, or float->str, but I would guess about 1500cc and 90000cc are reasonable guesses. The conversion introduces error in the last digits, so I should only return about 16 digits. At the moment, the only formatting that truncates to 16 digits max is when the exponent is too high in magnitude (ex. 1.234567890123456e-9).

Anyways, you can git the source on GitHub, but here are some ugly screenshots attached with evident rounding issues :P


EDIT: It turns out I computed the 10^-(2^k) table with lower precision. Now that it is fixed, numbers are being displayed with a bit better precision. I am currently working on implementing the B-G algorithm.