I've noticed that when I try to sign in to Omni via the home page ( https://www.omnimaga.org/index.php ), it just hangs with a "Loading..." status bar at the top and never actually logs in. However, when I navigate to any other page and try to log in at the top, it works fine. It has happened to me a few times in the past, but I so rarely need to log in that I attributed it to my internet connection.
uint16 varA = 0 Disp "print(uint8(99)+uint32(1))" Disp uint8(99)+uint32(1) Pause Disp "Hello World!" Lbl loop varA+uint8(1) -> varA Disp varA Pause GotoIf(varA!=uint16(3),loop) Mostly I was testing variable storage, mixed-type arithmetic, and conditional goto. EDIT: As an update, I'm just working on type conversions which is used in most mixed-type operations. For example, if I want to add a uint8 to a float, I can convert the uint8 to a float and then add them together as a normal float+float operation. I've added in converting from a string to uint8, int8, uint16, or int16, but more importantly, I've been working on converting more data types to strings. In this screenshot, I am displaying an 8-bit Gaussian integer (I foresee use in graphics), displaying a pixel (for now I am just displaying it's (x,y) coordinates), and finally there is a 16.16 fixed-point number displayed.
I'm glad I could help! And I'll admit, I am a little biased with Grammer Realistically, it is kind of a toss up. For intensive games, Axe can really outperform Grammer, if only for the fact that you can directly compile your own assembly into it. But for things like RPGs, Grammer is probably better since it has support for tilemapping, menus, and a nice Input routine.
Grammer has the advantage that the routines are in the app, so I could optimize for speed instead of smaller code, and I could afford to add in more higher-level functions than Axe. But Axe is faasst.
It takes a little longer for BASIC to process a direct string versus, say, Str3. So the first order of business is putting any strings used in the loop into a String var!
You only actually draw anything new when the player moves or when [MODE] is pressed. What I like to do is wait for user input with a Repeat loop, this way we aren't needlessly reupdating the screen. In this case:
Repeat Ans real(8 End Ans->|N Furthermore, I like to draw sprites with XOR logic. Erasing them is then a matter of redrawing them with XOR logic. As well, the identity(5) command allows you to update the LCD. First I'll draw with XOR logic and update the LCD at the same time, then re-XOR without updating the LCD. That code looks like:
:"X-TEST "3F7F71EBE1E2FEE1E1D25252818181FF->Str1 identity(8,"0000000000000000",0,0,0,0 Repeat |N=15 identity(5,Str1,X,Y,1,16,3,0,1 If theta identity(5,"0040FC40",X+8,Y+8,1,4,3,0,1 0->theta identity(5,Str1,X,Y,1,16,3,0,0 Repeat Ans real(8 End Ans->|N max(0,min(88,X-max(Ans={2,7,5})+max(Ans={3,6,8->X max(0,min(48,Y-max(|N={4,7,8})+max(|N={1,5,6->Y |N=54->theta If Ans identity(5,"0040FC40",X+8,Y+8,1,4,3,0,0 End If you are looking for speed, I highly suggest Axe or Grammer. Axe compiles source code to a stand-alone assembly program and is very fast. However, it generally has larger file sizes due to needing to include all of the routines (unless you use Axe Fusion, but that requires Axe to be on the user's calc). Grammer, like BASIC, is not a compiled language. It is comparable in speed to Axe, but programs are usually smaller. On the other hand, the user must have the Grammer app on their calc.
I included sourcecode for this BASIC version, as well as Axe and Grammer.
Personally, I really like both. I like Axe because it produces very high quality code in a stand-alone program. Grammer is better for debugging and offers more powerful resources. For example, in Axe, if there is a bug, you need to change the source and then recompile the program, then run it. For Grammer, like BASIC, you just press [ON], change the source, and it is ready to run again. Grammer also includes menus and Input, popups and other useful routines.
I think Axe and Grammer are easy enough to learn-- you just have to learn how pointers work. You can get a program up and running in minutes. However, to use Grammer like a pro, it can be a steep learning curve.
I updated the routine for converting a float to a string. Overall, the new code is 14 bytes larger, but faster, a lot more accurate, and less buggy (It actually hasn't crashed on any of the inputs I've used!) The only weird thing is that it can return "-NaN". The value which causes this is currently invalid and will not be returned by any of the other routines, so you have to manually create the number. In the future, I might make "-NaN" represent a second kind of NaN.
Interestingly, of all of the routines, single2str is the largest at 576 bytes. The next runner-up is the routine for 2^x, which includes all of the core routines for x^y, e^x, and 10^x as well, at 467 bytes.
In the screenshot, I am running the program ATANAGM. It takes a string as input, then prints the result with up to 7 digits. It uses an AGM-like algorithm to compute the arctangent, so I'm honestly surprised at the accuracy. Normally AGM is run in double the desired precision as error accumulates, whereas my code is just using all single precision operations.
New examples: ftos.8xp just displays the results of e^pi, e-pi, e/pi, and e*pi. mset.8xp draws the Mandelbrot Set.
Spoiler For New Float -> String Algorithm:
This is an idea that I have had for a loooong time that I finally got up the nerve to implement. In fact, this idea was one of the first candidates for my float library. It was daunting,but well worth it and easy to generalize to larger precisions
I want to display ##x## as a base-10 string. My starting goal is to get it into a format like ##x=10^{n}y##, where ##n## is an integer. I'm not going to be too picky about ##n##, so long as ##y## is smallish in magnitude. For example, if ##y\in[.1,10]##, then it is a lot easier to convert ##y## to a string and append an e##n## to that string. It would be like a pseudo scientific notation. From there, I can work on the formatting and making it look pretty.
So what I do first, is get an approximate value for ##n##. My floats are already in a format of ##x=2^{e}\cdot m##, so ##n\approx e\cdot log_{10}2##. Since ##e## is an 8-bit signed integer in this case, and we are only looking for an integer result, we can approximate this further by getting ##log_{10}2## to 8 bits of precision. Finally, ##n\approx \lfloor \frac{77e}{256}\rfloor##. In assembly, this translates to "Multiply by 77 and take the overflow."
Now that I have ##n##, I want to compute ##y## This is done as ##y=\frac{x}{10^{n}}##. Luckily, this sort of exponentiation is easy. For example, ##10^{13}=10^{1+4+8}=10^{1}\cdot 10^{4}\cdot 10^{8}##. In this case, ##0\le n\le 38##, so if we precompute a table for ##10^{1},10^{2},10^{4},10^{8},10^{16},10^{32}##, then we will only ever need at most 5 multiplications to compute ##10^{n}##. With this, we compute ##y##, which should be on ##y\in [.5,20]##. From there, we proceed to convert y to a string and perform any fancy formatting our hearts desire.
Applying this algorithm to the 80-bit floats will make it a lot faster to convert. It won't be as fast as converting a BCD float like TI's, but it will be more on the order of 40,000cc instead of 400,000cc
I have two neat routines, pushpop and diRestore. If you want to preserve HL,DE,BC, and AF, then you can just put a call to pushpop at the start of your routine. If you want interrupts to be restored on exiting your routine, you can call diRestore at the top of your code.
They both mess with the stack to insert a return routine on the stack. For example, when your code calls diRestore, then an ret will first return to the code to restore the interrupt status, and then back to the calling routine.
pushpop: ;26 bytes, adds 229cc to the calling routine ex (sp),hl push de push bc push af push hl ld hl,pushpopret ex (sp),hl push hl push af ld hl,12 add hl,sp ld a,(hl) inc hl ld h,(hl) ld l,a pop af ret pushpopret: pop af pop bc pop de pop hl ret
diRestore: ex (sp),hl push hl push af ld hl,restoreei ld a,r jp pe,+_ dec hl dec hl _: di inc sp inc sp inc sp inc sp ex (sp),hl dec sp dec sp dec sp dec sp pop af ret restoredi: di ret restoreei: ei ret Edit: calc84maniac brought up that those 'inc sp' instructions might cause some issues if interrupts fire in between them. The code is modified to disable interrupts first, then mess with the stack.
Honestly, I had/have no idea. Something got corrupted, but whether or not it is software I could not tell you. Judging from some IRC conversation, it might be a hardware issue :|
Haha, I have done something like that in the past. I just put a ton of my math and graphics routines into an App with a jump table to access them. Kind of like Floatlib. Really, I just need to make a Batlib 2 with my new coding techniques and methods and routines and ideas.
EDIT: Some updates can be found later in this thread, here. Hey all, I implemented the heapsort algorithm with some features inspired by Sean McLaughlin's implementation. Heapsort operates in O(n*log(n)) time, so it is fast even as array sizes increase. The cool perk to Sean's implementation that I wanted to include in mine is the ability to have a callback function that performs comparisons. This is fantastic if your array contains pointers to, say, an array of strings. The callback function can take the inputs (in my case, HL and DE), perform a string comparison, and return the result to the heapsort routine. Heapsort will then know whether or not to swap the elements.
As an example, I created a VAT sorting routine. It creates an array of pointers to the VAT entries of named variables (like Programs, Appvars, etc.), and then uses a fancy callback function that compares the names of the VAT entries. Returned is an alphabetically sorted array of pointers.
As well, I made a very fast routine to sort TI lists. The callback routine for this takes indices, constructs pointers to the actual list data, compares them, then returns the result. Since the OS routine uses an O(n2) algorithm, my program can perform way faster on larger lists. I ran a few tests on random lists and came up with:
* at 100 elements, the OS routine and my routine were pretty close in speed. * at 400 elements, my routine sorted it in ~1.3 seconds, versus ~9.5. for the OS routine * at 999 elements, my routine sorted it in ~3.0 seconds, versus ~55.5. for the OS.
#ifndef scrap #define scrap 8000h .echo "Warning, 'scrap' not defined, defining scrap=8000h" #endif #ifndef SMC arraybase= scrap arraylen = scrap+2 #endif heapflags= 33 curchild = 0 heapsort: ; HL points to the array data ; BC is the size of the array. NOT GREATER THAN 32767 ; IX points to the routine that compares the values #ifdef fast call heapify ld hl,(arraylen) #else push bc call heapify pop hl #endif _: dec hl ld (arraylen),hl #ifndef fast push hl #endif ld de,(arraybase) add hl,hl inc de add hl,de #ifdef fast ld a,(de) ldd inc hl ld (hl),a dec hl ld a,(de) ldd inc hl ld (hl),a #else call swap #endif ld bc,1 call propogate #ifdef fast ld hl,(arraylen) #else pop hl #endif ld a,h or l jr nz,-_ ret heapify: ;Inputs: ; HL points to the array data ; BC is the size of the array. NOT GREATER THAN 32767 ; IX points to the routine that compares the values ld (arraybase),hl ld (arraylen),bc srl b rr c _: push bc call propogate pop bc dec bc ld a,b or c jr nz,-_ ret propogate: ;BC-1 is the parent index ;2BC is the child1 index res curchild,(iy+heapflags) proppost: sla c rl b ld d,b ld e,c #ifdef SMC arraylen=$+1 ld hl,0 #else ld hl,(arraylen) #endif sbc hl,de add hl,de ret c ;no children ;compare the two children #ifdef SMC arraybase=$+1 ld hl,0 #else ld hl,(arraybase) #endif add hl,de add hl,de inc hl ld d,(hl) dec hl ld e,(hl) dec hl push hl ld a,(hl) dec hl ld l,(hl) ld h,a ;HL holds the value of child0 ;DE holds the value of child1 jr z,+_ call callix jr nc,+_ ex de,hl pop de inc de inc de set curchild,(iy+heapflags) push de _: ;{stack} points to the child ;HL is the value of the child ;BC points to the parent ;now compare the child and parent ex de,hl ld hl,(arraybase) add hl,bc push hl dec hl ld a,(hl) dec hl ld l,(hl) ld h,a call callix pop hl pop de ret nc dec hl call swap ;values swapped, now set parent=child ;BC is the index of child1 bit curchild,(iy+heapflags) jp z,proppost inc bc jp propogate swap: ;HL points to the top of one word ;DE points to the top of another ;Must preserve BC #ifdef fast ld a,(de) ldd inc hl ld (hl),a dec hl ld a,(de) ldd inc hl ld (hl),a inc c inc bc #else call +_ _: ld a,(de) ldd inc hl ld (hl),a dec hl inc bc #endif ret callix: jp (ix) Even when optimizing for speed over size, my code is smaller, but I'm fairly sure there are more optimizations to be found!
Oh, absolutely. And for programs that have hard-coded splash screens, this can reduce size of the overall program (since the decompression routine is so light-weight, the gains in compression are not exceeded by the size of the decompression code).
Decompression uses 4 bytes of additional RAM. I haven't analyzed average case performance, but best and worst case performance were pretty easy. Note that using LDIR to copy 768 bytes, along with an RET is 16138cc
Best case (speed): Uncompressible data => 16477cc which is 2.1% slower Worst case: A run of 4 bytes, followed by 153 of [match of 4 bytes, run of 1 byte] =>93336cc which takes about 5.78 times as long as an LDIR.
If I had to guess, average case might be somewhere closer to taking 4 times as long as a direct copy.
Edit: I analysed the example image and that took 46482cc for decompression (2.88 times as long as an LDIR) Attached is a copy of the compression program that returns how long it should take in clock cycles to decompress.
Hey all, I have been playing with compression. I made a Python program that compresses data, such as a TI Image, as well as a calculator program that can decompress such data to the graph screen.
While the Python program is general-purpose, the Z80 program assumes that the input is a compressed image, so you can mess things up! The neat thing is that in my example, the calc program (LZD) and the compressed data, along with their file headers, all came out to less than the size of the original image!
Hi all, I realized that I have a lot of unshared results that I've worked out over the years! (Then again, I have a lot of shared results ) Today's is dealing with squaring 3-digit numbers. If you aren't familiar with the Toom-Cook algorithm, read the spoiler! It's cool stuff. Otherwise, continue down.
Spoiler For Prerequisite:
The Toom-Cook algorithm basically generalizes multiplying integers as multiplying polynomials.For example, if we want to multiply 47*81, we can think of it as multiplying (7+4x)*(1+8x) and evaluating at x=10 (since our numbers were in base 10). This seems like a lot of unnecessary complexity, and it is for such small numbers, but let's look at a few key ideas.
If you are given 3 points on a plane, you can construct a unique quadratic function that passes through each point. The key here is that it is unique meaning there are no other degree-2 polynomials that pass through each point-- there is only one and that one must exists. On the other hand given 4 points, there might be a parabola that passes through each point, but it is not a guarantee and is in fact unlikely. As well, given three points, there exist infinitely many cubic polynomials (or quartic or quintic, etc.) that pass through each point. In general, if you have n+1 points, you are guaranteed to be able to find a unique polynomial of degree n that is the only such polynomial that passes through each point.
Guarantees and uniqueness are cool and all, but actually constructing the polynomials from those points can be tricky (but still "quickly" doable in a computational sense)! Luckily for our case, the points won't be totally arbitrary. Let's look back at our Toom-Cook example of multiplying 47*81. The polynomial product is (7+4x)*(1+8x), which we know should be a quadratic equation. Then if we evaluate this product at three easy points, we should be able to reconstruct the result. Specifically, let's evaluate at x=-1, x=0, and x=1: (7-4)*(1-8)=-21 7*1=7 (7+4)*(1+8)=99
The easiest way to reconstruct the polynomial by hand, in my opinion, is to use matrices to represent a system of equations. In this case, with ai being the points we evaluated at -1,0, and 1: [[ 1 (-1)1 (-1)2 a0] [ 1 01 02 a1] [ 1 11 12 a2]]
[[ 1 -1 1 a0] = [ 1 0 0 a1] [ 1 1 1 a2]]
And now we reduce this to reduced row-eschelon form. [[ 0 1 0 (a2-a0)/2] => [ 1 0 0 a1] [ 1 0 1 (a2+a0)/2]]
Now that we know the resulting coefficients, let's construct our polynomial: a1+(a2-a0)x/2+((a2+a0)/2-a1)x2 =7+60x+32x2 Let's evaluate this at x=10 and we have 3200+600+7 = 3807 which is indeed 47*81.
The nice thing about this is, since we've already solved the system of equations, we can reuse this and we don't need to compute it every time. We just compute the product at -1, 0, and , then plug it into our formula et voila. In fact, let's make a BASIC program that multiplies two ten-digit numbers and returns a 20-digit result. Input is passed through two 2-element lists, L1 and L2, and use those elements as 5-digit 'digits' to compute the product.
{1.-1 sum(L1Ans)sum(L2Ans->A L1(1)L2(1->B sum(L1)sum(L2->C .5(C-A {B,Ans,Ans+A-B->L1 0->C L1e-5->L1 ;engineering E For(K,1,3 C+L1(k->L1(K e-5int(Ans->C End C->L1(4 e5fPart(L1->L1 The output is a 20-digit integer comprised of four 5-digit numbers. The first element is the lowest 5 digits, the fourth element is the upper 5 digits.
To square a 3-digit number, we'll result in a degree-4 polynomial. Consequently, we'll need to evaluate at 5 points. The points I chose were {0,1,i,-1,-i} where 'i' is the imaginary unit. It may seem like evaluating at a complex point would be, well, complex, but we'll see. First, let's write our polynomial as a0+a1x+a2x2. Then: b0=f(0)=a02 b1=f(1)=(a0+a1+a2)2 b2=f(i)=(a0+a1i-a2)2 b3=f(-1)=(a0-a1+a2)2 b4=f(-i)=(a0-ia1-a2)2 Squaring complex numbers is easy though-- (a+bi)2= a2-b2+2abi = (a-b)(a+b)+2abi So simplifying: b2=(a0-a2+a1)(a0-a2-a1)+2i(a0-a2)a1 b4=(a0-a2+a1)(a0-a2-a1)-2i(a0-a2)a1
Then our polynomial is: b0+d3x+d2x2+d4x3+(d1-b0)x4
So for an example, let's find 348*348 Then we have: a0=8 a1=4 a2=3 b0=64 b1=(8+4+3)2=225 b3=(8-4+3)2=49 c3=(8-3+4)(8-3-4)=9 c4=2(8-3)*4=40 c1=(225+49)/2 = 137 c2=c1-49 = 88 d1=(137+9)/2 = 73 d2=d1-9 = 64 d3=(88+40)/2 = 64 d4=d3-40 = 24
So then, our result is: 64+64x+64x2+24x3+9x4 Evaluating at x=10 eyelids yields: 64+640+6400+24000+90000 = 121104, which is indeed 384*384 Now let's generate some Python code:
def sqr3(a, b, c): d = a + b + c # roughly 75% of cases exceed the original bit-width e = a - b + c # roughly 1/6 of the time this will exceed the original bit-width. However, whenever this exceeds the bit-width, then d also exceeds it. d *= d e *= e d = (d + e) / 2 e = d - e f = (a + b - c) * (a - b - c) # it is never the case that both multiplicands exceed their original word width, but about 1/3 of the time, one of them exceeds it (~2/3 of the time, neither exceed). b *= (a - c) # neither multiplicand exceeds the original bit-width (though the second operand may become negative). b += b a *= a e = (e + b) / 2 b = e - b d = (d - f) / 2 f += d - a return [a, e, d, b, f] Note that three of the sub-multiplications are just squaring the input. Also note that all divisions are by a power of two, and all results and operations are integer operations, not complex (assuming all inputs are integers).
I don't know crystals timers well, sorry. I do however remember a comment from about six years ago by thepenguin77, I think, involving this same premise. I couldn't find the post though