This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Messages - Xeda112358
Pages: 1 ... 46 47 [48] 49 50 ... 317
706
« on: October 29, 2013, 11:53:45 am »
Yep, with the 83+/84+/SE models, there is a delay that is usually not constant. Even on my newer calculators that often write immediately (so have a much tinier delay), occasionally the delay bumps from <12 clock cycles up to around 40. This happens sporadically but seems to happen every few hundred writes. There are times when you can be fairly certain that you have enough of a delay, though, and for some graphics routines, I like to interleave the algorithm with the LCD updates (so essentially compute the new value of the byte, then write the updated byte to RAM and then the LCD). If the operations take a long time, to compute, this can save up to about 35000 t-states at 6MHz. As an example, say it takes 100 t-states to transform a byte into its updated value. It takes about 45000 t-states to update the LCD on a typical calculator at 6MHz, (longer at 15MHz). So to change the whole LCD buffer, it would take about 76800+45000 t-states per frame or about 120000 t-states (50FPS). By interleaving the routines, you can use the transformation time of 100 t-states as a long enough delay between LCD writes and shave off about 35000 t-states of wasted cycles. This updates the LCD about every 87000 t-states or about 69FPS. An advantage to this is more consistent timings for LCD updates. I usually only do this when I know that the the byte conversions take much longer than the typical needed delay (I interleave routines in some of the fire animation, plasma animation, some cellular automata programs and a few others that I have written to boost performance).
707
« on: October 29, 2013, 10:47:42 am »
DJ_Omnimaga is basically an honored grandparent in the community Seriously, outside the internet, if I want to figure out how to do something related to college, I am more likely to ask the retired professors that hang out at the coffee shop than I am to ask the dean or something. If we append '_Omnimaga' to all regular user names, then you will seem like a normal member
708
« on: October 29, 2013, 09:34:36 am »
A while ago, I was trying to think of a way to detect if certain libraries were required for a given program. For BatLib, you would need to check for dim() commands that have a non-negative numerical argument first, then for FileSyst, you would check for first arguments that are negative or strings. For Celtic 3, you would need to check for 'misused' det() commands, imag(). For Celtic 3, you can check if Celtic 3 or DoorsCS7 is on the calc. For Omnicalc, you can check for real(nn,), and xLIB you can check for real(nn) as well, making sure nn is within the appropriate range. For xLIB, you can check for xLIB, Celtic 3, or DoorsCS7. Check the sum() commands for DoorsCS7. I wrote a file system application last year that, when running a program, looks at the file extension and opens the appropriate app or program. For example, running DonkeyKong.ion would attempt to open an ION compatible shell (ION, MirageOS, DoorsCS7) in order to run the program. However, this requires the user to manually add the file extension, so I wanted to make a super complicated routine to analyse source codes and figure out the best file extensions and for extracting image icon data. Also, welcome to Omni
709
« on: October 28, 2013, 03:37:07 pm »
Ah, yes, good catch I modified it a little more and it is actually in a better situation
710
« on: October 28, 2013, 01:05:47 pm »
Well, remember that (9x+3x^3)/(9+6x^2) is not the end of the modification (see the first post for the best fit dealing with 9 as the first coefficients). To arrive to the estimation of (9x+2x 3)/(9+5 2), you can observe that atan(1)=pi/4. Since 22/7 is a good approximation of pi, 11/14 is a good approximation of atan(1). From here, you can guess the coefficients so that at x=1, (9+a)/(9+b)=11/24 and you get a=2, b=5 Using this, the biggest error is around .00065. I am trying to get a formula of the form (ax+bx 3)/(a+cx 2+dx 4) using the ratio of 353/452 to approximate atan(1).
711
« on: October 28, 2013, 10:36:46 am »
That's okay, I plan to *try* to extend the precision much further than on [-1,1] and as well I want to try to make closer approximations. Currently, this is better than my CORDIC implementations for 24-bit floats.
712
« on: October 28, 2013, 10:09:23 am »
Actually, series expansions like Taylor series are really not good for arctangent (and ln()). Instead, I used a few terms for the continued fraction of arctangent and then I adjusted values to account for the error in the rest of the terms not included. So basically: x/(1+x 2/(3+4x 2)) (next term in the continued fraction divides by 5+?, so I use x 2 instead of 4x 2) x/(1+x 2/(3+x 2)) =x/((3+2x 2)/(3+x 2)) =x(3+x 2)/(3+2x 2) =(9x+3x 3)/(9+6x 2)=f(x) From there, I leave the constants the same (9 and 9) because those should be the same in the numerator and denominator (as x→0, this makes f(x)→x which is best for series expansion of arctangent around 0). Then I just adjusted the other two coefficients. It is possible that there are even better approximations using different numbers, too.
713
« on: October 28, 2013, 09:38:46 am »
EDIT: To understand the method, see this document.For simple formulas: arctan(x) : (9x+2x3)/(9+5x2) (at least 10 bits on [-1,1] use atan(x)=pi/2-atan(1/x) to extend to the whole real line) ln(x+1) : x(8+x)/(8+5x) (at least 9 bits on [0,1])
A while ago, I mentioned a nice formula that I derived for estimating arctangent to within 10 bits of accuracy on [-1,1] (so within 1/1024 of the actual arctan function). I really like it because it is so simple and short, and requires few math operations for such accuracy: (9x+2x 3)/(9+5x 2) You can get away with using the following operations: y=x*x a=(y<<1)+9 b=(y<<2)+y+9 return a*x/b
multiply : 2 Divide : 1 add : 3 sub : 0 shift : 3
714
« on: October 24, 2013, 08:48:06 pm »
I have updated the 24-bit floating point routines. Most of the routines have easy input/output. If there are two inputs, it is BHL and CDE. If it is one input, it is AHL or BHL. Outputs for the floating point routines are AHL or BHL. There are also a handful of extra routines, such as:
BC_Times_DE (returns 32-bit result, worst case is less than 700 t-states) normalise24 to renormalise 24-bit floats SetInf to return a float as infinity (keeping the original sign) FloatToUInt to convert a float to an unsigned integer FloatToSInt to convert a float to a signed integer SqrtHL_prec16 returns the square root of HL to 16 bits of accuracy.
The actual floating point routines are:
Float24Mul Float24Div Float24Add Float24Sub Float24Lg Float24Sqrt
715
« on: October 24, 2013, 10:38:02 am »
For 73 bytes (10 bytes more), you can save 110 t-states on the worst case. It involves some unrolling:
SqrtHL: ;input: HL ;Output: A ;73 bytes ;639 t-states worst case xor a ld b,4 ld e,l ld l,h ld h,a sqrt16loop: add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c ld a,c djnz sqrt16loop ld l,e ld b,2 sqrt16loop2: add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c ld a,c djnz sqrt16loop2
add a,a \ ld c,a add hl,hl add hl,hl jr nc,$+6 sub h \ jp $+6 sub h jr nc,$+6 inc c \ inc c cpl ld h,a
;b=0 ;c is the result ;l has two more bits to rotate into h ld a,l ld l,h ld h,b add a,a \ adc hl,hl add a,a \ adc hl,hl ld a,c sla c \ rl b sbc hl,bc ret c inc a ret
But really, if you are going to unroll that far, you should just unroll the whole thing and go up to 108 bytes and down to 543 t-states worst case:
SqrtHL: ;input: HL ;Output: A ;108 bytes ;543 t-states worst case ;Average is about 509 t-states xor a ld b,a
ld e,l ld l,h ld h,a
add hl,hl add hl,hl cp h jr nc,$+5 dec h ld a,4
add hl,hl add hl,hl ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c
ld a,c add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c
ld a,c add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c
ld a,c ld l,e
add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c
ld a,c add hl,hl add hl,hl add a,a ld c,a sub h jr nc,$+6 cpl ld h,a inc c inc c
ld a,c add a,a \ ld c,a add hl,hl add hl,hl jr nc,$+6 sub h \ jp $+6 sub h jr nc,$+6 inc c \ inc c cpl ld h,a
ld a,l ld l,h add a,a ld h,a adc hl,hl adc hl,hl ld a,c sll c \ rl b sbc hl,bc ret c inc a ret
716
« on: October 24, 2013, 08:33:44 am »
Yes Also, 'GOOGLE.ORG'
717
« on: October 23, 2013, 11:48:30 am »
Here is a link that might be useful for opcodes: http://ourl.ca/8996/290840To get the amount of free RAM, you could use this: EFE542EF9247EF5641EFBF4AC9
However, if you are using this in an Axe program, you can probably just use Asm(EFE542).
718
« on: October 22, 2013, 03:12:45 pm »
I think I figured it out. After looking back through my pseudo-code for the algorithm, I noticed that for n bits, I needed a variable with n bits, n+1 bits, and n+2 bits. I fixed it by taking the last two iterations as special cases. It bloats the code a little more, but it works: SqrtHL5: ;input: HL ;Output: A ;63 bytes ;749 t-states worst case ld bc,600h ld d,c ld e,c sqrt16loop: add hl,hl \ rl e add hl,hl \ rl e rl c ld a,c rla sub e jr nc,$+5 inc c cpl ld e,a djnz sqrt16loop
sla c \ ld a,c \ add a,a rl h \ rl e rl h \ rl e jr nc,$+6 sub e \ jp $+6 sub e jr nc,$+5 inc c cpl ld e,a ld l,c ld a,l add hl,hl \ rl e \ rl d add hl,hl \ rl e \ rl d sbc hl,de rla ret
Also, <750 t-states as promised (assuming I computed it correctly).
719
« on: October 22, 2013, 12:16:29 pm »
I was very excited to have a fairly fast 16-bit fast square root routine (300 cycles faster than my previous best), but after testing the routine, it appears to have problems for some large inputs. The first issue starts after the 14-bit range, so I imagine it is overflow. Looking at my routine, I foresaw the overflow issue in the last iteration so I moved it outside of the loop (while optimising it). I will be trying to fix this, but I thought there might be a chance somebody else spies the problem first.
sqrtHL: ;input is HL ;output is A ;734 t-states worst case ;39 bytes ld bc,700h ld d,c ld e,c sqrt16loop: add hl,hl \ rl e add hl,hl \ rl e rl c ld a,c rla sub e jr nc,$+5 inc c cpl ld e,a djnz sqrt16loop
ld l,c ld a,l add hl,hl \ rl e \ rl d add hl,hl \ rl e \ rl d sbc hl,de rla ret
720
« on: October 21, 2013, 03:04:15 pm »
Yeah, I dunno about joining that team, but you are free to use any of these routines. They haven't yet been rigorously optimised (obviously, if I managed to make a routine twice as fast there is huge room for improvement). My hope was that these would actually be used for an OS, whether it is an OS I write or an OS others write. I also planned to use these routines if I ever finish Grammer 3, or Grammer 4 (an OS).
I might make a concept programming language using these 80-bit and 24-bit floats, or I might incorporate them into FileSyst somehow.
Pages: 1 ... 46 47 [48] 49 50 ... 317
|