Show Posts

301

Site Feedback and Questions / Re: Login Broken

« on: August 07, 2018, 06:54:09 pm »

I opened the debugger and figured out why x__x. I wasn't being redirected to https://www.omnimaga.org/index.php, so login wasn't secure and so the data wasn't sent. However, by clicking a link to any section of the site (including the banner to redirect me to the homepage), it would direct me to the HTTPS site.

302

Site Feedback and Questions / Login Broken

« on: August 06, 2018, 04:31:38 pm »

I've noticed that when I try to sign in to Omni via the home page ( https://www.omnimaga.org/index.php ), it just hangs with a "Loading..." status bar at the top and never actually logs in. However, when I navigate to any other page and try to log in at the top, it works fine.
It has happened to me a few times in the past, but I so rarely need to log in that I attributed it to my internet connection.

303

TI Z80 / Re: Shunting-Yard Algorithm

« on: August 02, 2018, 09:46:38 pm »

Update today!
I have successfully run the following code:

Code: [Select]

uint16 varA = 0
Disp "print(uint8(99)+uint32(1))"
Disp uint8(99)+uint32(1)
Pause
Disp "Hello World!"
Lbl loop
varA+uint8(1) -> varA
Disp varA
Pause
GotoIf(varA!=uint16(3),loop)

Mostly I was testing variable storage, mixed-type arithmetic, and conditional goto.

EDIT:
As an update, I'm just working on type conversions which is used in most mixed-type operations. For example, if I want to add a uint8 to a float, I can convert the uint8 to a float and then add them together as a normal float+float operation.
I've added in converting from a string to uint8, int8, uint16, or int16, but more importantly, I've been working on converting more data types to strings. In this screenshot, I am displaying an 8-bit Gaussian integer (I foresee use in graphics), displaying a pixel (for now I am just displaying it's (x,y) coordinates), and finally there is a 16.16 fixed-point number displayed.

304

TI-BASIC / Re: Optimisation help request

« on: August 02, 2018, 09:28:26 pm »

I'm glad I could help! And I'll admit, I am a little biased with Grammer

Realistically, it is kind of a toss up. For intensive games, Axe can really outperform Grammer, if only for the fact that you can directly compile your own assembly into it. But for things like RPGs, Grammer is probably better since it has support for tilemapping, menus, and a nice Input routine.

Grammer has the advantage that the routines are in the app, so I could optimize for speed instead of smaller code, and I could afford to add in more higher-level functions than Axe. But Axe is faasst.

305

TI-BASIC / Re: Optimisation help request

« on: August 02, 2018, 01:32:36 pm »

It takes a little longer for BASIC to process a direct string versus, say, Str3. So the first order of business is putting any strings used in the loop into a String var!

You only actually draw anything new when the player moves or when [MODE] is pressed. What I like to do is wait for user input with a Repeat loop, this way we aren't needlessly reupdating the screen. In this case:

Code: [Select]

Repeat Ans
real(8
End
Ans->|N

Furthermore, I like to draw sprites with XOR logic. Erasing them is then a matter of redrawing them with XOR logic. As well, the identity(5) command allows you to update the LCD. First I'll draw with XOR logic and update the LCD at the same time, then re-XOR without updating the LCD. That code looks like:

Code: [Select]

identity(5,Str1,X,Y,1,16,3,0,0
identity(5,Str1,X,Y,1,16,3,0,1
Repeat Ans
real(8
End
Ans->|N

As well, we can use some BASIC trickery to combine movement and boundary checking on X and Y:

Code: [Select]

max(0,min(88,X-max(|N={2,7,5})+max(|N={3,6,8→X
max(0,min(48,Y-max(|N={4,7,8})+max(|N={1,5,6→Y

I ended up with my final code as:

Code: [Select]

:"X-TEST
"3F7F71EBE1E2FEE1E1D25252818181FF->Str1
identity(8,"0000000000000000",0,0,0,0
Repeat |N=15
identity(5,Str1,X,Y,1,16,3,0,1
If theta
identity(5,"0040FC40",X+8,Y+8,1,4,3,0,1
0->theta
identity(5,Str1,X,Y,1,16,3,0,0
Repeat Ans
real(8
End
Ans->|N
max(0,min(88,X-max(Ans={2,7,5})+max(Ans={3,6,8->X
max(0,min(48,Y-max(|N={4,7,8})+max(|N={1,5,6->Y
|N=54->theta
If Ans
identity(5,"0040FC40",X+8,Y+8,1,4,3,0,0
End

If you are looking for speed, I highly suggest Axe or Grammer.
Axe compiles source code to a stand-alone assembly program and is very fast. However, it generally has larger file sizes due to needing to include all of the routines (unless you use Axe Fusion, but that requires Axe to be on the user's calc).
Grammer, like BASIC, is not a compiled language. It is comparable in speed to Axe, but programs are usually smaller. On the other hand, the user must have the Grammer app on their calc.

I included sourcecode for this BASIC version, as well as Axe and Grammer.

Personally, I really like both. I like Axe because it produces very high quality code in a stand-alone program. Grammer is better for debugging and offers more powerful resources. For example, in Axe, if there is a bug, you need to change the source and then recompile the program, then run it. For Grammer, like BASIC, you just press [ON], change the source, and it is ready to run again. Grammer also includes menus and Input, popups and other useful routines.

I think Axe and Grammer are easy enough to learn-- you just have to learn how pointers work. You can get a program up and running in minutes. However, to use Grammer like a pro, it can be a steep learning curve.

306

TI Z80 / Re: Floatlib

« on: August 02, 2018, 01:23:30 am »

I updated the routine for converting a float to a string. Overall, the new code is 14 bytes larger, but faster, a lot more accurate, and less buggy (It actually hasn't crashed on any of the inputs I've used!)
The only weird thing is that it can return "-NaN". The value which causes this is currently invalid and will not be returned by any of the other routines, so you have to manually create the number. In the future, I might make "-NaN" represent a second kind of NaN.

Interestingly, of all of the routines, single2str is the largest at 576 bytes. The next runner-up is the routine for 2^x, which includes all of the core routines for x^y, e^x, and 10^x as well, at 467 bytes.

In the screenshot, I am running the program ATANAGM. It takes a string as input, then prints the result with up to 7 digits. It uses an AGM-like algorithm to compute the arctangent, so I'm honestly surprised at the accuracy. Normally AGM is run in double the desired precision as error accumulates, whereas my code is just using all single precision operations.

New examples:
ftos.8xp just displays the results of e^pi, e-pi, e/pi, and e*pi.
mset.8xp draws the Mandelbrot Set.

Spoiler For New Float -> String Algorithm:

307

ASM / Re: ASM Optimized routines

« on: July 27, 2018, 07:40:53 pm »

I have two neat routines, pushpop and diRestore. If you want to preserve HL,DE,BC, and AF, then you can just put a call to pushpop at the start of your routine. If you want interrupts to be restored on exiting your routine, you can call diRestore at the top of your code.

They both mess with the stack to insert a return routine on the stack. For example, when your code calls diRestore, then an ret will first return to the code to restore the interrupt status, and then back to the calling routine.

Code: [Select]

pushpop:
;26 bytes, adds 229cc to the calling routine
  ex (sp),hl
  push de
  push bc
  push af
  push hl
  ld hl,pushpopret
  ex (sp),hl
  push hl
  push af
  ld hl,12
  add hl,sp
  ld a,(hl)
  inc hl
  ld h,(hl)
  ld l,a
  pop af
  ret
pushpopret:
  pop af
  pop bc
  pop de
  pop hl
  ret

Code: [Select]

diRestore:
    ex (sp),hl
    push hl
    push af
    ld hl,restoreei
    ld a,r
    jp pe,+_
    dec hl
    dec hl
_:
    di
    inc sp
    inc sp
    inc sp
    inc sp
    ex (sp),hl
    dec sp
    dec sp
    dec sp
    dec sp
    pop af
    ret
restoredi:
    di
    ret
restoreei:
    ei
    ret

Edit: calc84maniac brought up that those 'inc sp' instructions might cause some issues if interrupts fire in between them. The code is modified to disable interrupts first, then mess with the stack.

308

HP Calculators / Re: HP Prime corrupting programs

« on: July 25, 2018, 05:52:53 pm »

Honestly, I had/have no idea. Something got corrupted, but whether or not it is software I could not tell you. Judging from some IRC conversation, it might be a hardware issue :|

309

TI Z80 / Re: Heapsort, VATSort, and ListSort

« on: July 03, 2018, 11:24:39 pm »

Haha, I have done something like that in the past. I just put a ton of my math and graphics routines into an App with a jump table to access them. Kind of like Floatlib. Really, I just need to make a Batlib 2 with my new coding techniques and methods and routines and ideas.

310

TI Z80 / Heapsort, VATSort, and ListSort

« on: July 03, 2018, 10:29:08 pm »

EDIT: Some updates can be found later in this thread, here.

Hey all, I implemented the heapsort algorithm with some features inspired by Sean McLaughlin's implementation. Heapsort operates in O(n*log(n)) time, so it is fast even as array sizes increase. The cool perk to Sean's implementation that I wanted to include in mine is the ability to have a callback function that performs comparisons. This is fantastic if your array contains pointers to, say, an array of strings. The callback function can take the inputs (in my case, HL and DE), perform a string comparison, and return the result to the heapsort routine. Heapsort will then know whether or not to swap the elements.

As an example, I created a VAT sorting routine. It creates an array of pointers to the VAT entries of named variables (like Programs, Appvars, etc.), and then uses a fancy callback function that compares the names of the VAT entries. Returned is an alphabetically sorted array of pointers.

As well, I made a very fast routine to sort TI lists. The callback routine for this takes indices, constructs pointers to the actual list data, compares them, then returns the result. Since the OS routine uses an O(n²) algorithm, my program can perform way faster on larger lists. I ran a few tests on random lists and came up with:

* at 100 elements, the OS routine and my routine were pretty close in speed.
* at 400 elements, my routine sorted it in ~1.3 seconds, versus ~9.5. for the OS routine
* at 999 elements, my routine sorted it in ~3.0 seconds, versus ~55.5. for the OS.

Code: (Heapsort) [Select]

#ifndef scrap
#define scrap 8000h
.echo "Warning, 'scrap' not defined, defining scrap=8000h"
#endif
#ifndef SMC
arraybase= scrap
arraylen = scrap+2
#endif
heapflags= 33
curchild = 0
heapsort:
;   HL points to the array data
;   BC is the size of the array. NOT GREATER THAN 32767
;   IX points to the routine that compares the values
#ifdef fast
    call heapify
    ld hl,(arraylen)
#else
    push bc
    call heapify
    pop hl
#endif
_:
    dec hl
    ld (arraylen),hl
#ifndef fast
    push hl
#endif
    ld de,(arraybase)
    add hl,hl
    inc de
    add hl,de
#ifdef fast
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
#else
    call swap
#endif
    ld bc,1
    call propogate
#ifdef fast
    ld hl,(arraylen)
#else
    pop hl
#endif
    ld a,h
    or l
    jr nz,-_    
    ret
heapify:
;Inputs:
;   HL points to the array data
;   BC is the size of the array. NOT GREATER THAN 32767
;   IX points to the routine that compares the values
    ld (arraybase),hl
    ld (arraylen),bc
    srl b
    rr c
_:
    push bc
    call propogate
    pop bc
    dec bc
    ld a,b
    or c
    jr nz,-_
    ret
propogate:
;BC-1 is the parent index
;2BC is the child1 index
    res curchild,(iy+heapflags)
proppost:
    sla c
    rl b
    ld d,b
    ld e,c
#ifdef SMC
arraylen=$+1
    ld hl,0
#else
    ld hl,(arraylen)
#endif
    sbc hl,de
    add hl,de
    ret c  ;no children
;compare the two children
#ifdef SMC
arraybase=$+1
    ld hl,0
#else
    ld hl,(arraybase)
#endif
    add hl,de
    add hl,de
    inc hl
    ld d,(hl)
    dec hl
    ld e,(hl)
    dec hl
    push hl
    ld a,(hl)
    dec hl
    ld l,(hl)
    ld h,a
    ;HL holds the value of child0
    ;DE holds the value of child1
    jr z,+_
    call callix
    jr nc,+_
    ex de,hl
    pop de
    inc de
    inc de
    set curchild,(iy+heapflags)
    push de
_:
;{stack} points to the child
;HL is the value of the child
;BC points to the parent
;now compare the child and parent
    ex de,hl
    ld hl,(arraybase)
    add hl,bc
    push hl
    dec hl
    ld a,(hl)
    dec hl
    ld l,(hl)
    ld h,a
    call callix
    pop hl
    pop de
    ret nc
    dec hl
    call swap
;values swapped, now set parent=child
;BC is the index of child1
    bit curchild,(iy+heapflags)
    jp z,proppost
    inc bc
    jp propogate
swap:
;HL points to the top of one word
;DE points to the top of another
;Must preserve BC
#ifdef fast
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    inc c
    inc bc
#else
    call +_
_:
    ld a,(de)
    ldd
    inc hl
    ld (hl),a
    dec hl
    inc bc
#endif
    ret
callix:
    jp (ix)

Even when optimizing for speed over size, my code is smaller, but I'm fairly sure there are more optimizations to be found!

311

TI Z80 / Re: LZD Image Compression/Decompression

« on: July 03, 2018, 10:11:24 pm »

Oh, absolutely. And for programs that have hard-coded splash screens, this can reduce size of the overall program (since the decompression routine is so light-weight, the gains in compression are not exceeded by the size of the decompression code).

312

TI Z80 / Re: LZD Image Compression/Decompression

« on: June 28, 2018, 09:39:04 am »

Decompression uses 4 bytes of additional RAM. I haven't analyzed average case performance, but best and worst case performance were pretty easy. Note that using LDIR to copy 768 bytes, along with an RET is 16138cc

Best case (speed): Uncompressible data => 16477cc which is 2.1% slower
Worst case: A run of 4 bytes, followed by 153 of [match of 4 bytes, run of 1 byte] =>93336cc which takes about 5.78 times as long as an LDIR.

If I had to guess, average case might be somewhere closer to taking 4 times as long as a direct copy.

Edit: I analysed the example image and that took 46482cc for decompression (2.88 times as long as an LDIR) Attached is a copy of the compression program that returns how long it should take in clock cycles to decompress.

313

TI Z80 / LZD Image Compression/Decompression

« on: June 27, 2018, 08:41:18 pm »

Hey all, I have been playing with compression. I made a Python program that compresses data, such as a TI Image, as well as a calculator program that can decompress such data to the graph screen.

While the Python program is general-purpose, the Z80 program assumes that the input is a compressed image, so you can mess things up! The neat thing is that in my example, the calc program (LZD) and the compressed data, along with their file headers, all came out to less than the size of the original image!

314

Math and Science / Fast 3-digit Squaring, Toom-Cook

« on: June 26, 2018, 02:18:40 pm »

Hi all, I realized that I have a lot of unshared results that I've worked out over the years! (Then again, I have a lot of shared results

)
Today's is dealing with squaring 3-digit numbers.
If you aren't familiar with the Toom-Cook algorithm, read the spoiler! It's cool stuff. Otherwise, continue down.

Spoiler For Prerequisite:

The Toom-Cook algorithm basically generalizes multiplying integers as multiplying polynomials.For example, if we want to multiply 47*81, we can think of it as multiplying (7+4x)*(1+8x) and evaluating at x=10 (since our numbers were in base 10). This seems like a lot of unnecessary complexity, and it is for such small numbers, but let's look at a few key ideas.

If you are given 3 points on a plane, you can construct a unique quadratic function that passes through each point. The key here is that it is unique meaning there are no other degree-2 polynomials that pass through each point-- there is only one and that one must exists. On the other hand given 4 points, there might be a parabola that passes through each point, but it is not a guarantee and is in fact unlikely. As well, given three points, there exist infinitely many cubic polynomials (or quartic or quintic, etc.) that pass through each point. In general, if you have n+1 points, you are guaranteed to be able to find a unique polynomial of degree n that is the only such polynomial that passes through each point.

Guarantees and uniqueness are cool and all, but actually constructing the polynomials from those points can be tricky (but still "quickly" doable in a computational sense)! Luckily for our case, the points won't be totally arbitrary. Let's look back at our Toom-Cook example of multiplying 47*81. The polynomial product is (7+4x)*(1+8x), which we know should be a quadratic equation. Then if we evaluate this product at three easy points, we should be able to reconstruct the result. Specifically, let's evaluate at x=-1, x=0, and x=1:
(7-4)*(1-8)=-21
7*1=7
(7+4)*(1+8)=99

The easiest way to reconstruct the polynomial by hand, in my opinion, is to use matrices to represent a system of equations. In this case, with a_i being the points we evaluated at -1,0, and 1:
[[ 1 (-1)¹ (-1)² a₀] [ 1 0¹ 0² a₁] [ 1 1¹ 1² a₂]]
[[ 1 -1 1 a₀] = [ 1 0 0 a₁] [ 1 1 1 a₂]]
And now we reduce this to reduced row-eschelon form.
[[ 0 1 0 (a₂-a₀)/2] => [ 1 0 0 a₁] [ 1 0 1 (a₂+a₀)/2]]
[[ 0 1 0 (a₂-a₀)/2] => [ 1 0 0 a₁] [ 0 0 1 (a₂+a₀)/2-a₁]]

Now that we know the resulting coefficients, let's construct our polynomial:
a₁+(a₂-a₀)x/2+((a₂+a₀)/2-a₁)x²
=7+60x+32x²
Let's evaluate this at x=10 and we have 3200+600+7 = 3807 which is indeed 47*81.

The nice thing about this is, since we've already solved the system of equations, we can reuse this and we don't need to compute it every time. We just compute the product at -1, 0, and , then plug it into our formula et voila. In fact, let's make a BASIC program that multiplies two ten-digit numbers and returns a 20-digit result. Input is passed through two 2-element lists, L1 and L2, and use those elements as 5-digit 'digits' to compute the product.

Code: [Select]

{1.-1
sum(L1Ans)sum(L2Ans->A
L1(1)L2(1->B
sum(L1)sum(L2->C
.5(C-A
{B,Ans,Ans+A-B->L1
0->C
L1e-5->L1    ;engineering E
For(K,1,3
C+L1(k->L1(K
e-5int(Ans->C
End
C->L1(4
e5fPart(L1->L1

The output is a 20-digit integer comprised of four 5-digit numbers. The first element is the lowest 5 digits, the fourth element is the upper 5 digits.

To square a 3-digit number, we'll result in a degree-4 polynomial. Consequently, we'll need to evaluate at 5 points. The points I chose were {0,1,i,-1,-i} where 'i' is the imaginary unit. It may seem like evaluating at a complex point would be, well, complex, but we'll see. First, let's write our polynomial as a₀+a₁x+a₂x².
Then:
b₀=f(0)=a₀²
b₁=f(1)=(a₀+a₁+a₂)²
b₂=f(i)=(a₀+a₁i-a₂)²
b₃=f(-1)=(a₀-a₁+a₂)²
b₄=f(-i)=(a₀-ia₁-a₂)²
Squaring complex numbers is easy though-- (a+bi)²= a²-b²+2abi = (a-b)(a+b)+2abi
So simplifying:
b₂=(a₀-a₂+a₁)(a₀-a₂-a₁)+2i(a₀-a₂)a₁
b₄=(a₀-a₂+a₁)(a₀-a₂-a₁)-2i(a₀-a₂)a₁

Noticing symmetry, let's define instead:
c₃=(a₀-a₂+a₁)(a₀-a₂-a₁)
c₄=2(a₀-a₂)a₁

Then:
b₂=c₃+ic₄
b₄=c₃-ic₄

1 0 0 0 0 b₀ 1 1 1 1 1 b₁ 1 i -1 -i 1 c₃+ic₄ 1 -1 1 -1 1 b₃ 1 -i -1 i 1 c₃-ic₄
1 0 0 0 0 b₀ 1 0 1 0 1 (b₁+b₃)/2 = c₁ => 0 1 0 1 0 (b₁-b₃)/2 = c₂ 1 0 -1 0 1 c₃ 0 1 0 -1 0 c₄
1 0 0 0 0 b₀ 1 0 0 0 1 (c₁+c₃)/2 = d₁ => 0 0 1 0 0 (c₁-c₃)/2 = d₂ 0 1 0 0 0 (c₂+c₄)/2 = d₃ 0 0 0 1 0 (c₂-c₄)/2 = d₄

Then our polynomial is:
b₀+d₃x+d₂x²+d₄x³+(d₁-b₀)x⁴

So for an example, let's find 348*348
Then we have:
a₀=8
a₁=4
a₂=3
b₀=64
b₁=(8+4+3)²=225
b₃=(8-4+3)²=49
c₃=(8-3+4)(8-3-4)=9
c₄=2(8-3)*4=40
c₁=(225+49)/2 = 137
c₂=c₁-49 = 88
d₁=(137+9)/2 = 73
d₂=d₁-9 = 64
d₃=(88+40)/2 = 64
d₄=d₃-40 = 24

So then, our result is:
64+64x+64x²+24x³+9x⁴
Evaluating at x=10 ~~eyelids~~ yields:
64+640+6400+24000+90000 = 121104, which is indeed 384*384

Now let's generate some Python code:

Code: [Select]

def sqr3(a, b, c):
    d = a + b + c    # roughly 75% of cases exceed the original bit-width
    e = a - b + c    # roughly 1/6 of the time this will exceed the original bit-width. However, whenever this exceeds the bit-width, then d also exceeds it.
    d *= d
    e *= e
    d = (d + e) / 2
    e = d - e
    f = (a + b - c) * (a - b - c)   # it is never the case that both multiplicands exceed  their original word width, but about 1/3 of the time, one of them exceeds it (~2/3 of the time, neither exceed).
    b *= (a - c)     # neither multiplicand exceeds the original bit-width (though the second operand may become negative).
    b += b
    a *= a
    e = (e + b) / 2
    b = e - b
    d = (d - f) / 2
    f += d - a
    return [a, e, d, b, f]

Note that three of the sub-multiplications are just squaring the input. Also note that all divisions are by a power of two, and all results and operations are integer operations, not complex (assuming all inputs are integers).

315

TI-Boy SE - Game Boy Emulator For TI-83+SE/84 / Re: TI-Boy CE Progress Thread

« on: June 23, 2018, 01:41:06 am »

Holy hell, that is awesome!

Messages - Xeda112358

Site Feedback and Questions / Re: Login Broken

Site Feedback and Questions / Login Broken

TI Z80 / Re: Shunting-Yard Algorithm

TI-BASIC / Re: Optimisation help request

TI-BASIC / Re: Optimisation help request

TI Z80 / Re: Floatlib

ASM / Re: ASM Optimized routines

HP Calculators / Re: HP Prime corrupting programs

TI Z80 / Re: Heapsort, VATSort, and ListSort

TI Z80 / Heapsort, VATSort, and ListSort

TI Z80 / Re: LZD Image Compression/Decompression

TI Z80 / Re: LZD Image Compression/Decompression

TI Z80 / LZD Image Compression/Decompression

Math and Science / Fast 3-digit Squaring, Toom-Cook

TI-Boy SE - Game Boy Emulator For TI-83+SE/84 / Re: TI-Boy CE Progress Thread