Author Topic: Assembly Programmers - Help Axe Optimize!  (Read 154161 times)

0 Members and 2 Guests are viewing this topic.

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #120 on: January 04, 2011, 05:32:59 pm »
I think most of those I just missed thanks for catching them.  The p_EQNX and p_NENX were intentionally left out though because I need to rewrite my optimizer to handle negative shorts first.  As for your new optimizations, I'm not sure If I want to add those because It would require me to write a lot more code for the parser since I just have all the math operations and optimizations macro'd in right now.  Checking for a variable would be a little tricky in that section.  But I'll try it out later if I have time.
« Last Edit: January 04, 2011, 05:46:10 pm by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #121 on: January 04, 2011, 05:34:52 pm »
Alright, these aren't that urgent anyways. Just trying to squeeze every last byte and cycle out of Axe programs. ;)
« Last Edit: January 04, 2011, 09:23:57 pm by Runer112 »

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #122 on: January 05, 2011, 06:42:20 pm »
Wow that took a long time. But I hope the results will be worth it.
Quigibo, get out your reading glasses. ;)

(By the way, I haven't tested these myself, but the code looks solid. If you believe that any of these would not work or have any questions, tell me.)



Smaller nibble retrieval routines. 1 byte saved for reading from RAM, 3 bytes saved for reading from ROM.

Thanks to calc84maniac for reminding me that $0000-$7FFF is read-only!

Code: (Original routine: 18 bytes, ~72 cycles) [Select]
p_Nib1:
.db __Nib1End-$-1
scf
rr h
rr l
ld a,(hl)
jr c,__Nib1Skip
rrca
rrca
rrca
rrca
__Nib1Skip:
and %00001111
ld l,a
ld h,0
ret
__Nib1End:
   
Code: (Optimized routine: 17 bytes, ~105 cycles) [Select]
p_Nib1:
.db __Nib1End-$-1
xor a
scf
rr h
rr l
ld b,(hl)
__Nib1Loop:
rrd
ccf
jr c,__Nib1Loop
ld (hl),b
ld l,a
ld h,0
ret
__Nib1End:

Code: (Original routine: 18 bytes, ~68 cycles) [Select]
p_Nib2:
.db __Nib2End-$-1
srl h
rr l
ld a,(hl)
jr c,__Nib2Skip
rrca
rrca
rrca
rrca
__Nib2Skip:
and %00001111
ld l,a
ld h,0
ret
__Nib2End:
   
Code: (Optimized routine: 15 bytes, ~77 cycles) [Select]
p_Nib2:
.db __Nib2End-$-1
xor a
srl h
rr l
rrd
jr c,__Nib2Skip
rld
__Nib2Skip:
ld l,a
ld h,0
ret
__Nib2End:



Smaller and faster nibble storage routine. 1 byte and ~17 cycles saved.

Code: (Original routine: 23 bytes, ~127 cycles) [Select]
p_NibSto:
.db __NibStoEnd-$-1
pop bc
pop de
push bc
scf
rr h
rr l
ld b,(hl)
ex de,hl ;hl = byte ;de = addr
ld a,%11110000
jr c,__NibStoSkip
add hl,hl
add hl,hl
add hl,hl
add hl,hl
cpl
__NibStoSkip:
and b
or l
ld (de),a
ret
__NibStoEnd:
   
Code: (Optimized routine: 22 bytes, ~110 cycles) [Select]
p_NibSto:
.db __NibStoEnd-$-1
pop bc
pop de
push bc
scf
rr h
rr l
jr c,__NibStoHigh
rrd
ld a,e
rld
ret
__NibStoHigh:
rld
ld a,e
rrd
ret
__NibStoEnd:



Faster buffer inversion routine. 9951 cycles saved.

Code: (Original routine: 16 bytes, 38425 cycles) [Select]
p_InvBuff:
.db __InvBuffEnd-1-$
ld hl,plotSScreen
ld bc,768
__InvBuffLoop:
ld a,(hl)
cpl
ld (hl),a
inc hl
dec bc
ld a,b
or c
jr nz,__InvBuffLoop
ret
__InvBuffEnd:
   
Code: (Optimized routine: 16 bytes, 28474 cycles) [Select]
p_InvBuff:
.db __InvBuffEnd-1-$
ld hl,plotSScreen
ld bc,3
__InvBuffLoop:
ld a,(hl)
cpl
ld (hl),a
inc hl
djnz __InvBuffLoop
dec c
jr nz,__InvBuffLoop
ret
__InvBuffEnd:



You'll laugh at this... but I managed to save 4 cycles in the unarchive and archive routines. And only if the targeted variable doesn't exist. But hey, why not take all the savings you can get.

I think this works. It relies on the page number returned in b always being 0 if a RAM page and always being in the range or $01-$7F if a flash page.

Code: (Original routine: 18 bytes, a lot of cycles) [Select]
p_Unarchive:
.db __UnarchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
inc b
dec b
ret z
B_CALL(_Arc_Unarc)
ld hl,1
ret
__UnarchiveEnd:
   
Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]
p_Unarchive:
.db __UnarchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
dec b
ret m
inc b
B_CALL(_Arc_Unarc)
ld hl,1
ret
__UnarchiveEnd:

Code: (Original routine: 18 bytes, a lot of cycles) [Select]
p_Archive:
.db __ArchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
inc b
dec b
ret nz
B_CALL(_Arc_Unarc)
ld hl,1
ret
__ArchiveEnd:
   
Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]
p_Archive:
.db __ArchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
dec b
ret p
inc b
B_CALL(_Arc_Unarc)
ld hl,1
ret
__ArchiveEnd:



Smaller archived variable locating. 4 bytes saved.

Code: (Original routine: 55 bytes, a lot of cycles) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
jr c,__GetArcFail
ld a,(OP1)
cp ListObj
jr z,__GetArcName
cp ProgObj
jr z,__GetArcName
cp AppvarObj
jr z,__GetArcName
cp GroupObj
jr z,__GetArcName
__GetArcStatic:
ld hl,14
jr __GetArcDone
__GetArcName:
ld hl,9
add hl,de
B_CALL(_LoadDEIndPaged)
ld d,0
inc hl
inc hl
__GetArcDone:
add hl,de
__GetArcFail:
ex de,hl
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcEnd:
   
Code: (Optimized routine: 51 bytes, a lot of cycles) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
jr c,__GetArcFail
and %00011111
ld d,b
ld hl,__GetArcVarTypes
ld bc,__GetArcEnd-__GetArcVarTypes
cpir
ld b,d
ld hl,14
jr nz,__GetArcDone
ld l,9
add hl,de
B_CALL(_LoadDEIndPaged)
ld d,0
inc e
inc e
__GetArcDone:
add hl,de
__GetArcFail:
ex de,hl
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcVarTypes:
.db ListObj,ProgObj,AppvarObj,GroupObj
__GetArcEnd:



Smaller 8-bit get bit routine. 1 byte saved.

Code: (Original routine: 13 bytes, ~110 cycles) [Select]
p_GetBit:
.db 13
ld a,e
and %00000111
inc a
ld b,a
ld a,l
__GetBitLoop:
add a,a
djnz __GetBitLoop
ld h,b
ld l,b
rl l
   
Code: (Optimized routine: 12 bytes, ~152 cycles) [Select]
p_GetBit:
.db 12
ld a,e
and %00000111
inc a
ld b,a
xor a
__GetBitLoop:
ld h,a
add hl,hl
djnz __GetBitLoop
ld l,h
ld h,a



As long as the low byte of vx_SptBuff is at most $F8: faster sprite flipping routines. 16 cycles saved each.

Code: (Original routine: 13 bytes, 338 cycles) [Select]
p_FlipV:
.db __FlipVEnd-1-$
ex de,hl
ld hl,vx_SptBuff+8
ld b,8
__FlipVLoop:
dec hl
ld a,(de)
ld (hl),a
inc de
djnz __FlipVLoop
ret
__FlipVEnd:
   
Code: (Optimized routine: 13 bytes, 322 cycles) [Select]
p_FlipV:
.db __FlipVEnd-1-$
ex de,hl
ld hl,vx_SptBuff+8
ld b,8
__FlipVLoop:
dec l
ld a,(de)
ld (hl),a
inc de
djnz __FlipVLoop
ret
__FlipVEnd:

Code: (Original routine: 21 bytes, 1907 cycles) [Select]
p_FlipH:
.db __FlipHEnd-1-$
ld de,vx_SptBuff
push de
ld b,8
__FlipHLoop1:
ld c,(hl)
ld a,1
__FlipHLoop2:
rr c
rla
jr nc,__FlipHLoop2
ld (de),a
inc hl
inc de
djnz __FlipHLoop1
pop hl
ret
__FlipHEnd:
   
Code: (Optimized routine: 21 bytes, 1891 cycles) [Select]
p_FlipH:
.db __FlipHEnd-1-$
ld de,vx_SptBuff
push de
ld b,8
__FlipHLoop1:
ld c,(hl)
ld a,1
__FlipHLoop2:
rr c
rla
jr nc,__FlipHLoop2
ld (de),a
inc hl
inc e
djnz __FlipHLoop1
pop hl
ret
__FlipHEnd:



Smaller and faster sprite rotating routines. 2 bytes smaller and 166 cycles faster. These also save 16 cycles from relying on the low byte of vx_SptBuff being at most $F8.

Code: (Original routine: 22 bytes, 2874 cycles) [Select]
p_RotC:
.db __RotCEnd-1-$
ex de,hl
ld hl,vx_SptBuff
ld c,8
__RotCLoop1:
push hl
ld b,8
ld a,(de)
__RotCLoop2:
rla
rr (hl)
inc hl
djnz __RotCLoop2
pop hl
inc de
dec c
jr nz,__RotCLoop1
ret
__RotCEnd:
   
Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]
p_RotC:
.db __RotCEnd-1-$
ex de,hl
ld c,8+1
__RotCLoop1:
ld hl,vx_SptBuff
dec c
ret z
ld b,8
ld a,(de)
__RotCLoop2:
rla
rr (hl)
inc l
djnz __RotCLoop2
inc de
jr __RotCLoop1
__RotCEnd:

Code: (Original routine: 22 bytes, 2874 cycles) [Select]
p_RotCC:
.db __RotCCEnd-1-$
ex de,hl
ld hl,vx_SptBuff
ld c,8
__RotCCLoop1:
push hl
ld b,8
ld a,(de)
__RotCCLoop2:
rra
rl (hl)
inc hl
djnz __RotCCLoop2
pop hl
inc de
dec c
jr nz,__RotCCLoop1
ret
__RotCCEnd:
   
Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]
p_RotCC:
.db __RotCCEnd-1-$
ex de,hl
ld c,8+1
__RotCCLoop1:
ld hl,vx_SptBuff
dec c
ret z
ld b,8
ld a,(de)
__RotCCLoop2:
rra
rl (hl)
inc l
djnz __RotCCLoop2
inc de
jr __RotCCLoop1
__RotCCEnd:



That's all I have for now. I think I got just about everything I could possibly find, but I might have some more later. And if you want all the routines in one file, I uploaded them all here.
« Last Edit: January 05, 2011, 07:21:32 pm by Runer112 »

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55943
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: Assembly Programmers - Help Axe Optimize!
« Reply #123 on: January 05, 2011, 06:49:06 pm »
Woah a lot of new optimizations! O.O Nice job Runer112!

Offline Eeems

  • Mr. Dictator
  • Administrator
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 6266
  • Rating: +318/-36
  • little oof
    • View Profile
    • Eeems
Re: Assembly Programmers - Help Axe Optimize!
« Reply #124 on: January 05, 2011, 06:54:54 pm »
Great job guys! :D
I should probably get back into working with AXE sometime again. Although it's hard after moving over to assembly again.
/e

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55943
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: Assembly Programmers - Help Axe Optimize!
« Reply #125 on: January 05, 2011, 06:57:55 pm »
Yeah it can be hard if you don,t have as much freedom to do certain things you need. When I switched to Axe it was a bit hard to use BASIC again.

Offline Eeems

  • Mr. Dictator
  • Administrator
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 6266
  • Rating: +318/-36
  • little oof
    • View Profile
    • Eeems
Re: Assembly Programmers - Help Axe Optimize!
« Reply #126 on: January 05, 2011, 07:05:04 pm »
Yeah I know :(
For me it's working with higher level stuff, my head just seems to work with lower level languages better, well almost, I still can work with JavaScript and PHP pretty good :)
/e

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #127 on: January 05, 2011, 07:14:09 pm »
Your nibble read routine that reads from archive will fail, because it's ROM. In that case, it might work to do something like this:
Code: [Select]
p_Nib2:
.db __Nib2End-$-1
xor a
srl h
rr l
rrd
jr c,__Nib2Skip
rld
__Nib2Skip:
ld l,a
ld h,0
ret
__Nib2End:
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #128 on: January 05, 2011, 07:16:59 pm »
I just sort of blindly copied the $8000-$FFFF routine, not taking into account the fact that $0000-7FFF was ROM. Good catch, I'll edit my post now. And it turns out that's actually smaller anyways. ;)
« Last Edit: January 05, 2011, 07:20:05 pm by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #129 on: January 05, 2011, 10:59:54 pm »
 O.O How do you do this? You're a madman!

So I've never really used or knew what the rrd and rld instructions did.  I though thought they were some of those obscure instructions like daa, which they are, but I guess there are situations where you can use them, like with daa in the hex routine.  Awesome job there!  I can't believe I missed the push pop thing in the sprite rotation ones, that was embarrassing...  I really do like the getcalc ones, but it uses an inline self-reference.  I think because there's only one, I can easily replace it, but I'll have to make sure.   The inversion one is excellent as well.  I could have sworn I tried that same method before but couldn't get it the same size.

However, these are the concerns I have:  First, the sprite rotation commands, why did you move the ret to the middle of the routine?  It looks like that's just going to add more cycles since a conditional jr takes the same amount of cycles as a regular jr anyway.  Next, is it really a safe assumption that all ROM pages are between $7F and $FF for all current models and potentially future models?  And lastly, are you sure trying to modifying rom (unsuccessfully) has no potential side effects to things like flags and registers?
« Last Edit: January 06, 2011, 04:50:25 pm by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #130 on: January 05, 2011, 11:07:27 pm »
The CPU has no way of knowing whether the write fails or not, so there are no side effects.

Also, I think the p_GetArc routine fails if the archived VAT entry overlaps into another page. You really need to check for that.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #131 on: January 05, 2011, 11:11:30 pm »
However, these are the concerns I have:  First, the sprite rotation commands, why did you move the ret to the middle of the routine?  It looks like that's just going to add more cycles since a conditional jr takes the same amount of cycles as a regular jr anyway.

Yeah, I'm not really sure why I did that. Feel free to initialize c to 8 instead and decrease and check c at the end using a conditional jump instead.

Next, is it really a safe assumption that all ROM pages are between $7F and $FF for all current models and potentially future models?

$01 and $7F are all ROM pages, and $80-$87 are all RAM pages (at least for the calculators that have all those RAM pages), so it would make sense that $80 and up is RAM. But feel free to leave this optimization out anyways, it only saves 4 cycles part of the time.

And lastly, are you sure trying to modifying rom (unsuccessfully) has no potential side effects to things like flags and registers?

After a quick test, yes, rrd and rld affect a correctly even when hl points to a byte in ROM.



EDIT: By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.
« Last Edit: January 05, 2011, 11:19:23 pm by Runer112 »

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55943
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: Assembly Programmers - Help Axe Optimize!
« Reply #132 on: January 06, 2011, 02:19:30 am »

EDIT: By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.

This would be really great. When I programmed in Axe, knowing the size of commands was always something I wanted to be aware of, and I'm sure a lot of people would like it, since some people might be really tight on memory and want to find every way to optimize for size.

Offline TIfanx1999

  • ಠ_ಠ ( ͡° ͜ʖ ͡°)
  • CoT Emeritus
  • LV13 Extreme Addict (Next: 9001)
  • *
  • Posts: 6173
  • Rating: +191/-9
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #133 on: January 06, 2011, 08:41:27 am »
Faster buffer inversion routine. 9951 cycles saved.
It's over 9000!!!!
What?!? O.O
9000?
 <_< Yea, I know... I had to...
But seriously dude, all those optimizations are awesome!  ;D

Offline Happybobjr

  • James Oldiges
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2325
  • Rating: +128/-20
  • Howdy :)
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #134 on: January 06, 2011, 10:09:17 am »
By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.
* happybobjr loves runner
School: East Central High School
 
Axe: 1.0.0
TI-84 +SE  ||| OS: 2.53 MP (patched) ||| Version: "M"
TI-Nspire    |||  Lent out, and never returned
____________________________________________________________