Show Posts

226

ASM / Re: [z80] 32 bit by 16 bits division and 32 bit square root

« on: March 17, 2019, 04:22:38 pm »

Okay, when I get home I might have a chance to optimize this more. And yes, that's a good place for it. I have been working on-and-off on that page for a few months to revamp it. If it is locked due to a pending draft, let me know and I'll move the draft to another page altogether.

Also, this might be useful: https://github.com/Zeda/z80float/tree/master/extended/sqrt
The sqrt32 and sqrt64 are special-purpose routines (the top two bits shouldn't both be 0), but it can give you an idea. Also the division folder might inspire optimizations in this routine. It has a 32/16 routine but it is also special-purpose (top bit of denominator is always set).

227

ASM / Re: [z80] 32 bit by 16 bits division and 32 bit square root

« on: March 17, 2019, 10:40:19 am »

Thanks! I am on mobile, so not much help, but a few quick optimizations I see are changing those Inc/dec ix to use ixl instead (the bottom bit is 0 and 1 at those points, so no worry of overflow) and you don't need the `or a` before the sbc since the add before it is always pushing out a 0 bit (so leaving carry reset).

Do you prefer speed optimizations or size optimizations, btw?

Thanks for the work, I actually visit this thread semi-frequently!

228

Math and Science / The General Class of Karatsuba-Like Multiplication

« on: March 16, 2019, 11:51:33 am »

In playing with ideas to improve multiplication in my float routines, I recreated an algorithm that I soon learned was a known, but rarely discussed generalization of the Karatsuba algorithm!

For a long time, I thought of the Karatsuba algorithm as a special case of Toom-Cook multiplication. If you aren't familiar with that, I discuss it here (see the spoiler!). To be fair, it is a special case, but there is actually another way to arrive to the Karatsuba algorithm, and honestly it is a bit easier to implement and recognize.

To arrive at the algorithm, we start with polynomial multiplication. I'll start off with multiplying two cubic polynomials and that should give us enough terms to get a good view of what's going on:
##
\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&a_{0}b_{0}\\
&&+x(a_{0}b_{1}+a_{1}b_{0})\\
&&+x^{2}(a_{0}b_{2}+a_{1}b_{1}+a_{2}b_{0})\\
&&+x^{3}(a_{0}b_{3}+a_{1}b_{2}+a_{2}b_{1}+a_{3}b_{0})\\
&&+x^{4}(a_{1}b_{3}+a_{2}b_{2}+a_{3}b_{1})\\
&&+x^{5}(a_{2}b_{3}+a_{3}b_{2})\\
&&+x^{6}(a_{3}b_{3})\\
\end{array}
##

There is a lot of symmetry going on there in the form of ##a_{n}b_{m}+\dots+a_{m}b_{n}##. My observation was that ##a_{n}b_{m}+a_{m}b_{n} = (a_{n}+a_{m})(b_{n}+b_{m})-a_{n}b_{n}-a_{m}b_{m}##.On the surface, this looks like we are trading two multiplications for three (more work), but we actually get to reuse some of those terms in other calculations. So let's reorganize:

##
\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&a_{0}b_{0}\\
&&+x((a_{0}b_{1}+a_{1}b_{0}))\\
&&+x^{2}((a_{0}b_{2}+a_{2}b_{0})+a_{1}b_{1})\\
&&+x^{3}((a_{0}b_{3}+a_{3}b_{0})+(a_{1}b_{2}+a_{2}b_{1}))\\
&&+x^{4}((a_{1}b_{3}+a_{3}b_{1})+a_{2}b_{2})\\
&&+x^{5}((a_{2}b_{3}+a_{3}b_{2}))\\
&&+x^{6}(a_{3}b_{3})\\
\end{array}
##

Now let's define some auxiliary calculations and rewrite the product:
##
z_{0} = a_{0}b_{0}\\
z_{1} = a_{1}b_{1}\\
z_{2} = a_{2}b_{2}\\
z_{3} = a_{3}b_{3}\\

z_{4} = (a_{0}+a_{1})(b_{0}+b_{1})\\
z_{5} = (a_{0}+a_{2})(b_{0}+b_{2})\\
z_{6} = (a_{0}+a_{3})(b_{0}+b_{3})\\
z_{7} = (a_{1}+a_{2})(b_{1}+b_{2})\\
z_{8} = (a_{1}+a_{3})(b_{1}+b_{3})\\
z_{9} = (a_{2}+a_{3})(b_{2}+b_{3})\\

\begin{array}{l}
f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\

&=&z_{0}\\
&&+x(z_{4}-z_{0}-z_{1})\\
&&+x^{2}(z_{5}-z_{0}-z_{2}+z_{1})\\
&&+x^{3}(z_{6}-z_{0}-z_{3}+z_{7}-z_{1}-z_{2})\\
&&+x^{4}(z_{8}-z_{1}-z_{3}+z_{2})\\
&&+x^{5}(z_{9}-z_{2}-z_{3})\\
&&+x^{6}z_{3}\\
\end{array}
##

So with just this trick, we went from 16 multiplies to 10, at the cost of 12 extra additions/subtractions. In general, to multiply two n-th degree polynomials, we need ##\frac{(n+1)(n+2)}{2}## multiplies instead of ##(n+1)^{2}##, so almost half as many multiplies are needed. These are both ##O(n^{2})## operations, but the trick is to apply recursion. For example, if we want to multiply two degree-15 polynomials, we can break it up into 4 cubic polynomials and treat the smaller polynomials as coefficients. Then we need to multiply 10 pairs of cubic polynomials, resulting in 100 total multiplies. That's a lot, but it's less than 256 with the standard method, or even 136 with the algorithm above. If we carry this out another step, we can multiply two 63-degree polynomials with 1000 multiplies instead of 4096 or 2080. In fact, for a ##(4^{n}-1)-th## degree polynomial, we need ##10^{n}## multiplies, making this algorithm ##O(n^{log_{4}(10)})=O(n^{1.66096\dots})##. This is a non-trivial, better-than-##O(n^{2})## multiplication algorithm.

Using our generalized technique, let's multiply two quadratic polynomials:

Spoiler For Spoiler:

And the Karatsuba algorithm that we've all come to know and love:

Spoiler For Spoiler:

But is Karatsuba the best of this class of algorithm?

Spoiler For Yes:

A neat application of the non-recursive algorithm discussed above can be used to multiply a number by it's complement, which is another approach to how I came up with this Sine Approximation Algorithm.

Spoiler For Spoiler:

So let's assume that all of our coefficients are either 0 or 1. The complement of 0 is 1, and vice versa. This is useful in programming, since inverting the bits of a number is basically doing -x-1.

Let's define some operations. Where x and y are binary (0 or 1), then

~x means the complement of x, and can be rewritten (1-x).
x|y is bitwise-OR and can be rewritten as x+y-xy.
x&y is bitwise-AND and can be rewritten as x*y or xy for brevity.
x^y is bitwise-XOR and can be rewritten as x+y-2xy.

Observe, then, that:

x~x is always 0.
x&x=x.
x~y+y~x = x-xy+y-yx = x+y-2xy = x^y.

So if we want to multiply, say, a 4-bit number by its complement, we have:

Code: [Select]

z0 = a0~a0=0
z1 = a1~a1=0
z2 = a2~a2=0
z3 = a3~a3=0

z4 = (a0+a1)(~a0+~a1) = a0~a1 + a1~a0 = a0^a1
z5 = (a0+a2)(~a0+~a2) = a0~a2 + a2~a0 = a0^a2
z6 = (a0+a3)(~a0+~a3) = a0~a3 + a3~a0 = a0^a3
z7 = (a1+a2)(~a1+~a2) = a1~a2 + a2~a1 = a1^a2
z8 = (a1+a3)(~a1+~a3) = a1~a3 + a3~a1 = a1^a3
z9 = (a2+a3)(~a2+~a3) = a2~a3 + a3~a2 = a2^a3
f(x)g(x) = (a0+a1x+a2x^2+a3x^3)(~a0+~a1x+~a2x^2+~a3x^3)

=z0                      |=0
 +x(z4-z0-z1)            | +x(a0^a1)
 +x^2(z5-z0-z2+z1)       | +x^2(a0^a2)
 +x^3(z6-z0-z3+z7-z1-z2) | +x^3(a0^a3+a1^a2)
 +x^4(z8-z1-z3+z2)       | +x^4(a1^a3)
 +x^5(z9-z2-z3)          | +x^5(a2^a3)
 +x^6z3                  | +0

229

TI Z80 / [BASIC] bigpi- Arbitrary Precision Pi Calculator

« on: March 14, 2019, 12:22:41 pm »

Happy Pi day!

Here is an arbitrary precision calculator for the digits of pi. It is written in TI-BASIC, so it is slooooow, but it works!

You basically pass in how many digits you want and it returns them in 5-digit chunks in L1. Keep in mind, if a chunk only has 4 digits, that means the top digit is 0 !

230

Grammer / Re: Grammer 2-The APP

« on: March 12, 2019, 02:36:33 pm »

I've been fixing more bugs added at least one new command, and modified a few more. Of note:

▶Dec has been added to convert text to a float. This is particularly needed for user input (for example, with the Input command ). I struggled with this, because I want to add in an int-->float conversion, but this seemed more important. I'm still not sure what token to use (or if I should just modify the ▶Dec token for both).
I documented a bunch of the available float commands. Many of them have been available, but are still incomplete so I didn't want to document them.
e and π can now act as floating-point constants. Of course, π is also used to prefix hexadecimal numbers, so this only works when π is not followed by a hexadecimal digit.
I absorbed the Grampkg appvar into the main app for now. The appvar was made as a way to extend Grammer as available space got small. Now that I am using two app pages, this isn't necessary. The way it is set up, the appvar code is literally just sitting inside the app, so it still interfaces like the original appvar, just faster to locate.

Attached is Grammer 2.50.2.0 as well as a screeny of a program using the ▶Dec command.

231

TI Z80 / Re: Issues converting image for use with zstart

« on: March 11, 2019, 09:27:51 pm »

I can't test it, but maybe the BMP itself has a header that level8 can't process? I know I had that issue in an unrelated project (I thought I was converting it to a low density format, but it was actually refusing to

)

232

Grammer / Re: Help with Grammer

« on: March 03, 2019, 06:56:32 pm »

A suggestion from Eeems:

Code: [Select]

Zeda: Would the LCD fix solve joshuarpl's issue with Grammer? (https://www.ticalc.org/archives/files/fileinfo/366/36608.html)

Try this first. Some batches of calcs have particularly slow LCDs, so this would fix that issue.

233

TI Calculators / Re: Help with TI-Boy

« on: February 27, 2019, 09:45:11 am »

Hmm, the ROM size shouldn't be a problem then, and I don't think the software version is an issue. How about TI-Boy version and did you convert the ROM using the same converter for that version ? (I think the later versions of TI-Boy and earlier versions expect different file formats on-calc)

234

Grammer / Re: Grammer 2-The APP

« on: February 20, 2019, 03:54:37 pm »

I fixed some bugs including one with the rectangle routine, line clipping, and floating point conversion (to and from strings). There was a serious bug in the str->float routine that caused it to overwrite the source code when the integer part had an even number of digits :|

235

TI Z80 / Re: A Calculator for the TI-83+/84+

« on: February 20, 2019, 10:40:14 am »

Since I'm using the z80float library, addition and subtraction are slightly slower on average (about 350cc, which translates to 20%), multiplication and division are about three times faster. The real bottleneck is in converting to and from strings which is significantly slower than the TI-OS since they use a BCD format.

Eventually, I want to make a fairly simple programming language and my plan is to tokenize/precompile so that the bottleneck is only in displaying strings (converting from a string to an integer or a float will be done once). Since I had this idea in mind as I worked on this, evaluation from plaintext first compiles to a bytecode and then runs it through a parser. If I wanted it to be a non-programmable calculator I could combine both into one step that uses less memory (and slightly less CPU time).

236

TI Z80 / A Calculator for the TI-83+/84+

« on: February 19, 2019, 01:05:18 am »

I have been working on a side project for a little while that will revolutionize the 83+/84+ monochrome series: A basic calculator program.

That's right, your TI-83+ is no longer just for games. If you want to add or subtract-- or even multiply and divide-- this program can do that for you!

Be warned, some of the math is a little buggy still. You'll need the Floatlib app located here, but otherwise just run it as an assembly program. I made a custom input routine, so it might feel a little clunky, but it should be pretty straight-forward (except [DEL] actually acts as a backspace). Exit with [ON].

237

TI Z80 / Re: Floatlib

« on: February 19, 2019, 12:55:54 am »

Some more routines were finally added in, especially the string -> float extended-precision routine. I still need to implement some if the trig routines, but I only have 565 bytes of space left :| I might need to look into ways to reduce the size of the built-in documentation. Here is the compiled app, the source is up on GitHub.

238

ASM / Re: Miscellaneous ASM Questions

« on: February 16, 2019, 09:33:05 pm »

Yes, bcall(_fillAppBaseTable) should be sufficient.

239

ASM / Re: Miscellaneous ASM Questions

« on: February 16, 2019, 06:44:13 pm »

Quote from: Sue Doenim on February 16, 2019, 05:07:27 pm

1) What RAM pages can I use within programs without having any bad consequences?

Even numbered pages are execution protected by default, so executing code here will crash your calc. That said, you can definitely store data. On the calcs with USBs, page 83h has some data that should be preserved or restored in the first 128 bytes. Just remember that any OS routines that interact with memory or variables expect the right pages to be there.

Ram Pages (WikiTI)

Quote from: Sue Doenim on February 16, 2019, 05:07:27 pm

2) When an appvar is less than 16kb and is archived, will it all be on the same ROM page?

No, you can't expect this. Variables won't cross sector boundaries (groups of four RAM pages), but they may cross pages.

Quote from: Sue Doenim on February 16, 2019, 05:07:27 pm

3) Can you interface with headphones beyond simply sending them signals with the ring and the tip? If so, how?

I don't know what you mean by this, sorry :| If you want to use headphones, you have to interact directly with port 0.

240

TI Calculators / Re: Help with TI-Boy

« on: February 14, 2019, 11:43:54 am »

Out of curiosity, how big is the ROM? I know many ROMs are just too big or require too much RAM.

Messages - Xeda112358

ASM / Re: [z80] 32 bit by 16 bits division and 32 bit square root

ASM / Re: [z80] 32 bit by 16 bits division and 32 bit square root

Math and Science / The General Class of Karatsuba-Like Multiplication

TI Z80 / [BASIC] bigpi- Arbitrary Precision Pi Calculator

Grammer / Re: Grammer 2-The APP

TI Z80 / Re: Issues converting image for use with zstart

Grammer / Re: Help with Grammer

TI Calculators / Re: Help with TI-Boy

Grammer / Re: Grammer 2-The APP

TI Z80 / Re: A Calculator for the TI-83+/84+

TI Z80 / A Calculator for the TI-83+/84+

TI Z80 / Re: Floatlib

ASM / Re: Miscellaneous ASM Questions

ASM / Re: Miscellaneous ASM Questions

TI Calculators / Re: Help with TI-Boy