This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Messages - Xeda112358
Pages: 1 ... 33 34 [35] 36 37 ... 317
511
« on: April 30, 2014, 09:24:13 am »
Oh, that's a really good idea! I actually have to rewrite that division, though (it doesn't return accurate results because it isn't keeping track of all of the remainder term). But even so, i wonder how many more returns can use a similar optimization? Excellent!
512
« on: April 24, 2014, 07:22:41 pm »
Dang, those drawing commands make it seem like they really want to help programmers who want to work with 3D. This is awesome!
513
« on: April 24, 2014, 02:35:32 pm »
Haha, thanks, it was also leafy's birthday, too I turned 22 which means I am officially old, right?
514
« on: April 14, 2014, 08:00:49 pm »
I have been too busy with life and the floating point library thing, but I do hope to finish this
515
« on: April 12, 2014, 05:42:20 pm »
Hmm, isn't 16-bit math sufficient, then? Also, in good news, I actually shaved off close to 18000 more clock cycles from the square root routine, putting it at a little over 3 times faster than TI's. I am back to working on the exponential and logarithm routine, but they are table based (using a single 64-element LUT with 9 bytes to each element). From this I will build the Float->Str routine.
516
« on: April 11, 2014, 10:31:42 am »
For that, what kind of floating point precision would you need? 80-bit floats are pretty wild, but you may have seen that I have a bunch of 24-bit floats and I can make 40-bit floats that are really fast, too (like, cut all the times in 4ths). In fact, the multiplication routine uses a divide-and-conquer algorithm that has an intermediate step of computing 32-bit multiplications (and a 40-bit float would have 32-bit multiplication).
517
« on: April 11, 2014, 09:47:18 am »
Well, here are timings I got from WabbitEmu for the OS (86825 ccs) and mine (46831ccs). So it isn't quite twice as fast, but it is almost. I am also working on a routine to cut out another 16000 or so, so then it will be almost 3 times faster. For the timings I have:
Args used: 1.570796326794897 57.29577951308232 For example, 57.29577951308232/1.570796326794897
TI-OS Float80 diff ratio analysis add/subtract 2758 3166 +408 1.1479 Add/sub is a bit slower, possibly noticeably multiply 35587 10851 -24736 0.3049 Multiplication is signigicantly faster. Noticeable. divide 40521 18538 -21983 0.4575 Division is significantly faster. Noticeable. square root 86825 46831 -39994 0.5394 Square roots, are significantly faster. Noticeable
notes: TI-Floats are approximately 47 bits of precision. Float80 uses 64 bits of precision (that is 14 digits versus 19)
518
« on: April 11, 2014, 09:17:37 am »
Since the previous upload has been downloaded already, I guess I will upload this revised version. Last night in bed I was thinking about better ways to do the square root and I realized my method was not in fact as fast as it could be. See, I did the following computation in my head:
Xn is a sequence converging to sqrt(X) and is defined by: ##x_{n+1}=\frac{x_{n}+\frac{x}{x_{n}}}{2}## (Basically, it averages an overestimate and underestimate of the square root.)
So say at iteration n, I have the error being ##k=x_{n}-\sqrt{x}##. Then the next iteration of the sequence has the error:
##x_{n+1}-\sqrt{x}=\frac{x_{n}+\frac{x}{x_{n}}}{2}-\sqrt{x}##
##=\frac{x_{n}+\frac{x}{x_{n}}-2\sqrt{x}}{2}##
##=\frac{x_{n}-\sqrt{x}+\frac{x}{x_{n}}-\sqrt{x}}{2}##
##=\frac{k+\frac{x}{x_{n}}-\sqrt{x}}{2}##
##=\frac{k+\frac{x}{\sqrt{x}+k}-\sqrt{x}}{2}##
##=\frac{k+\frac{x-x-k\sqrt{x}}{\sqrt{x}+k}}{2}##
##=\frac{k-\frac{k\sqrt{x}}{\sqrt{x}+k}}{2}##
##=\frac{k-\frac{k(x_{n}-k)}{\sqrt{x}+k}}{2}##
##=\frac{k-\frac{k x_{n}-k^{2})}{\sqrt{x}+k}}{2}##
##=\frac{k-\frac{k x_{n}-k^{2})}{x_{n}}}{2}##
##=\frac{k-k+\frac{k^{2})}{x_{n}}}{2}##
##=\frac{\frac{k^{2})}{x_{n}}}{2}## What does this even mean? Suppose I had m bits of accuracy. Then the new accuracy, as long as x is on [1,4), as my values are, is at least 2n+1 bits of accuracy (up to 2n+2). Basically, if I could get 8 bits of accuracy initially and cheaply, the next iteration would yield at least 17 bits, then 35, then 71. What I decided to do, then was, just use a simple and fast 16-bit square root routine that I wrote to get an 8-bit estimate. However, when I went to look in my folder, I found that I had already written a 32-bit square root routine that returned a 16-bit result.
This means that I successfully removed 3 iterations of Newton's method since last night, bringing the total to just 2 iterations. Now it is down from 108 000 t-states to just 46 000, putting it much faster than TI's.
519
« on: April 10, 2014, 10:40:05 pm »
Hmm, why don't you simplify the math to A*16 and B*16 respectively? It would be faster and smaller, I think, unless Axe auto-simplifies? (multiplication by a power of 2 is a pretty cheap operation).
520
« on: April 10, 2014, 10:33:46 pm »
I added in formats and full support for signed Zero, signed Infinity, and NAN. I also found that Newton's method may actually be my best option for the square root algorithm, but I did mix in my own algorithm, too. The one I came up with is a linear-time convergence algorithm, but the iterations are faster than an iteration of Newton's method. I used one iteration of my method at a cost of about 2000 cycles, to give a better approximate than 2 iterations of Newton's method, so I got to remove one Newton's method iteration and the already optimized first iteration. The result was over 20 000 clock cycles saved, putting it about 700 clock cycles slower than TI's (but if I only had to compute 47 bits of precision, it would be over 22000 t-states faster for the same accuracy ). Anyways, I updated the google docs thing, and I also attached the files. I reorganized things by splitting up main routines (add/sub, multiply, divide, square root) into their own files and then I just #include them.
521
« on: April 10, 2014, 10:23:59 pm »
Hmm, I emailed that I had to forfeit... I don't have the time this year (graduating, work, papers, homework, presentations) I have no clue how I got a 30 in the Axe category. I think I only sent one program and it was a dud.
522
« on: April 04, 2014, 10:08:32 pm »
There are some parts of running an application that are a lot easier, and other parts that are more difficult. The problem is, the way an App exits, it actually shuts down all contexts. This means that when it exits, it does not return to the program. The way a program is called to run, when it exits, it is just the OS exiting a subroutine. I actually think I wrote a routine to run an app, but it got a little complicated. So there is a question: What is the app that you want to run? A lot of apps have a jump table that will allow you to call certain subroutines, or even the app itself without having to exit with the typical app exit procedure. And a few notes: If you do run an app, you have to find a way to delete the copied Asm program from RAM. When an assembly program is run, it is actually copied to 9D95h (start of user RAM) and executed. When the program is done, the OS deletes the copy. If you run an app from within a program, the copy won't be deleted and you will have some eaten RAM. So the process that you will need to follow is for your program to locate the app, swap in its page, copy code to safe RAM that deletes the copy and jumps to the App. For some information: http://wikiti.brandonw.net/index.php?title=83Plus:BCALLs:4C51
523
« on: April 04, 2014, 07:55:38 am »
The fonts themselves can be in RAM or archive, but the hook itself is stored in RAM and with it is the name of the fontset. It does make it easy to switch fonts as needed.
524
« on: April 01, 2014, 12:15:15 am »
Awesome, I have always wanted this partnership A revolutionary day indeed!
525
« on: March 31, 2014, 10:50:33 pm »
Actually, yes it could, as long as you had two (not necessarily integer) factors. That is what makes it cool. If you wanted to find the geometric mean of 9E99 and 9.99E99, the multiplication would overflow the calculator, but using the BASIC code given, it return 9.482088378E99 properly.
Pages: 1 ... 33 34 [35] 36 37 ... 317
|