Thanks! I am on mobile, so not much help, but a few quick optimizations I see are changing those Inc/dec ix to use ixl instead (the bottom bit is 0 and 1 at those points, so no worry of overflow) and you don't need the `or a` before the sbc since the add before it is always pushing out a 0 bit (so leaving carry reset).
Do you prefer speed optimizations or size optimizations, btw?
Thanks for the work, I actually visit this thread semi-frequently!
In playing with ideas to improve multiplication in my float routines, I recreated an algorithm that I soon learned was a known, but rarely discussed generalization of the Karatsuba algorithm!
For a long time, I thought of the Karatsuba algorithm as a special case of Toom-Cook multiplication. If you aren't familiar with that, I discuss it here (see the spoiler!). To be fair, it is a special case, but there is actually another way to arrive to the Karatsuba algorithm, and honestly it is a bit easier to implement and recognize.
To arrive at the algorithm, we start with polynomial multiplication. I'll start off with multiplying two cubic polynomials and that should give us enough terms to get a good view of what's going on: ## \begin{array}{l} f(x)g(x) &=& (a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3})(b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3})\\
There is a lot of symmetry going on there in the form of ##a_{n}b_{m}+\dots+a_{m}b_{n}##. My observation was that ##a_{n}b_{m}+a_{m}b_{n} = (a_{n}+a_{m})(b_{n}+b_{m})-a_{n}b_{n}-a_{m}b_{m}##.On the surface, this looks like we are trading two multiplications for three (more work), but we actually get to reuse some of those terms in other calculations. So let's reorganize:
So with just this trick, we went from 16 multiplies to 10, at the cost of 12 extra additions/subtractions. In general, to multiply two n-th degree polynomials, we need ##\frac{(n+1)(n+2)}{2}## multiplies instead of ##(n+1)^{2}##, so almost half as many multiplies are needed. These are both ##O(n^{2})## operations, but the trick is to apply recursion. For example, if we want to multiply two degree-15 polynomials, we can break it up into 4 cubic polynomials and treat the smaller polynomials as coefficients. Then we need to multiply 10 pairs of cubic polynomials, resulting in 100 total multiplies. That's a lot, but it's less than 256 with the standard method, or even 136 with the algorithm above. If we carry this out another step, we can multiply two 63-degree polynomials with 1000 multiplies instead of 4096 or 2080. In fact, for a ##(4^{n}-1)-th## degree polynomial, we need ##10^{n}## multiplies, making this algorithm ##O(n^{log_{4}(10)})=O(n^{1.66096\dots})##. This is a non-trivial, better-than-##O(n^{2})## multiplication algorithm.
Using our generalized technique, let's multiply two quadratic polynomials:
&=&z_{0}\\ &&+x(z_{3}-z_{0}-z_{1})\\ &&+x^{2}(z_{4}-z_{0}-z_{2}+z_{1})\\ &&+x^{3}(z_{5}-z_{1}-z_{2})\\ &&+x^{4}(z_{2})\\ \end{array} ## And by applying this recursively we get an ##O(n^{log_{3}(6)})=O(n^{1.63092\dots})## algorithm.
And the Karatsuba algorithm that we've all come to know and love:
\begin{array}{l} f(x)g(x) &=& (a_{0}+a_{1}x)(b_{0}+b_{1}x)\\ &=&z_{0}+x(z_{2}-z_{0}-z_{1})+x^{2}(z_{1})\\ \end{array} ## And by applying this recursively we get an ##O(n^{log_{2}(3)})=O(n^{1.58496\dots})## algorithm.
But is Karatsuba the best of this class of algorithm?
Spoiler For Yes:
To recursively apply multiplying ##(k-1)-th## degree polynomials, we get an algorithm that is ## O(n^{log_{k}((k)(k+1)/2)})=O(n^{log_{k}(k)+log_{k}(k+1)-log_{k}(2)}) =O(n^{1+log_{k}(k+1)-log_{k}(2)}) =O(n^{2+log_{k}(1+1/k)-log_{k}(2)}) ## And the integer, ##k>1## that minimizes the exponent is 2.
A neat application of the non-recursive algorithm discussed above can be used to multiply a number by it's complement, which is another approach to how I came up with this Sine Approximation Algorithm.
Spoiler For Spoiler:
So let's assume that all of our coefficients are either 0 or 1. The complement of 0 is 1, and vice versa. This is useful in programming, since inverting the bits of a number is basically doing -x-1.
Let's define some operations. Where x and y are binary (0 or 1), then
~x means the complement of x, and can be rewritten (1-x).
x|y is bitwise-OR and can be rewritten as x+y-xy.
x&y is bitwise-AND and can be rewritten as x*y or xy for brevity.
x^y is bitwise-XOR and can be rewritten as x+y-2xy.
Observe, then, that:
x~x is always 0.
x&x=x.
x~y+y~x = x-xy+y-yx = x+y-2xy = x^y.
So if we want to multiply, say, a 4-bit number by its complement, we have:
Here is an arbitrary precision calculator for the digits of pi. It is written in TI-BASIC, so it is slooooow, but it works!
You basically pass in how many digits you want and it returns them in 5-digit chunks in L1. Keep in mind, if a chunk only has 4 digits, that means the top digit is 0 !
I've been fixing more bugs added at least one new command, and modified a few more. Of note:
▶Dec has been added to convert text to a float. This is particularly needed for user input (for example, with the Input command ). I struggled with this, because I want to add in an int-->float conversion, but this seemed more important. I'm still not sure what token to use (or if I should just modify the ▶Dec token for both).
I documented a bunch of the available float commands. Many of them have been available, but are still incomplete so I didn't want to document them.
e and π can now act as floating-point constants. Of course, π is also used to prefix hexadecimal numbers, so this only works when π is not followed by a hexadecimal digit.
I absorbed the Grampkg appvar into the main app for now. The appvar was made as a way to extend Grammer as available space got small. Now that I am using two app pages, this isn't necessary. The way it is set up, the appvar code is literally just sitting inside the app, so it still interfaces like the original appvar, just faster to locate.
Attached is Grammer 2.50.2.0 as well as a screeny of a program using the ▶Dec command.
I can't test it, but maybe the BMP itself has a header that level8 can't process? I know I had that issue in an unrelated project (I thought I was converting it to a low density format, but it was actually refusing to )
Zeda: Would the LCD fix solve joshuarpl's issue with Grammer? (https://www.ticalc.org/archives/files/fileinfo/366/36608.html)Try this first. Some batches of calcs have particularly slow LCDs, so this would fix that issue.
Hmm, the ROM size shouldn't be a problem then, and I don't think the software version is an issue. How about TI-Boy version and did you convert the ROM using the same converter for that version ? (I think the later versions of TI-Boy and earlier versions expect different file formats on-calc)
I fixed some bugs including one with the rectangle routine, line clipping, and floating point conversion (to and from strings). There was a serious bug in the str->float routine that caused it to overwrite the source code when the integer part had an even number of digits :|
Since I'm using the z80float library, addition and subtraction are slightly slower on average (about 350cc, which translates to 20%), multiplication and division are about three times faster. The real bottleneck is in converting to and from strings which is significantly slower than the TI-OS since they use a BCD format.
Eventually, I want to make a fairly simple programming language and my plan is to tokenize/precompile so that the bottleneck is only in displaying strings (converting from a string to an integer or a float will be done once). Since I had this idea in mind as I worked on this, evaluation from plaintext first compiles to a bytecode and then runs it through a parser. If I wanted it to be a non-programmable calculator I could combine both into one step that uses less memory (and slightly less CPU time).
I have been working on a side project for a little while that will revolutionize the 83+/84+ monochrome series: A basic calculator program.
That's right, your TI-83+ is no longer just for games. If you want to add or subtract-- or even multiply and divide-- this program can do that for you!
Be warned, some of the math is a little buggy still. You'll need the Floatlib app located here, but otherwise just run it as an assembly program. I made a custom input routine, so it might feel a little clunky, but it should be pretty straight-forward (except [DEL] actually acts as a backspace). Exit with [ON].
Some more routines were finally added in, especially the string -> float extended-precision routine. I still need to implement some if the trig routines, but I only have 565 bytes of space left :| I might need to look into ways to reduce the size of the built-in documentation. Here is the compiled app, the source is up on GitHub.
1) What RAM pages can I use within programs without having any bad consequences?
Even numbered pages are execution protected by default, so executing code here will crash your calc. That said, you can definitely store data. On the calcs with USBs, page 83h has some data that should be preserved or restored in the first 128 bytes. Just remember that any OS routines that interact with memory or variables expect the right pages to be there.