Author Topic: Assembly Programmers - Help Axe Optimize!  (Read 154146 times)

0 Members and 1 Guest are viewing this topic.

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #195 on: May 26, 2011, 02:22:21 am »
Anyone up for some math?  :P

I want to implement the reciprocal function for fixed point math.  For 8.8 numbers, A-1 is essentially just E10000//A however that division requires a number larger than can fit in a register pair.  Ideally, the routine could hijack a jump point into the current division routine instead of rewriting another one.  But its possible due to the symmetry involved that there might be a significantly optimized method using a slightly different approach, but I can't think of how that would work.  Has anyone seen or written a routine like this before?
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #196 on: May 26, 2011, 04:35:58 pm »
I don't know of any speed-optimized function specific to taking the inverse. But that definitely doesn't mean one doesn't exist. However, you could easily implement it if you added 8.8 fixed point division:


p_Inverse:
   .db 7
   ex   de,hl
   ld   hl,$100
   call   $0000      ;sub_88Div
   .db rp_Ans,2


p_88Div:
   .db __88DivEnd-1-$
   ld   a,h
   xor   d
   push   af
   bit   7,h
   jr   z,$+8
   xor   a
   sub   l
   ld   l,a
   sbc   a,a
   sub   h
   ld   h,a
   bit   7,d
   jr   z,$+8
   xor   a
   sub   e
   ld   e,a
   sbc   a,a
   sub   d
   ld   d,a
   ld   b,24
   call   $0000      ;sub_Div+2
   pop   af
   add   a,a
   ret   nc
   xor   a
   sub   l
   ld   l,a
   sbc   a,a
   sub   h
   ld   h,a
   ret
__88DivEnd:
   .db   rp_Ans,12



EDIT: Just kidding, that hijacking of the 16/16 division routine to make an 8.8 division routine doesn't work. But it's definitely possible to hijack the 16/16 division routine at least for an 8.8 inverse.
« Last Edit: May 26, 2011, 06:57:40 pm by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #197 on: May 26, 2011, 08:27:08 pm »
I'm not planning to add 8.8 division.  I think just multiplying by the inverse should work with enough accuracy.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #198 on: May 26, 2011, 08:28:41 pm »
But the logical way to get the inverse is to divide, is it not?

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #199 on: May 26, 2011, 08:31:35 pm »
Right, but an inverse can use a standard 16/16 division instead of a 24/16.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #200 on: May 26, 2011, 08:37:17 pm »
Yeah, I actually had a routine written for that which hijacked the 16/16 division routine, but deleted it in favor of the 8.8 division routine. However I realized that the 8.8 division routine doesn't work, so I'll try to recreate what I had before:

Code: [Select]
p_Inverse:
.db __InverseEnd-1-$
xor a
bit 7,h
push af
jr z,$+8
sub l
ld l,a
sbc a,a
sub h
ld h,a
xor a
ex de,hl
ld bc,16<<8
ld hl,1
call $0000 ;sub_Div+10
pop af
ret z
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
__InverseEnd:
.db rp_Ans,12
« Last Edit: May 26, 2011, 08:39:05 pm by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #201 on: May 26, 2011, 08:40:56 pm »
I actually have a copy of the routine you poster earlier and it was a bit more optimized so no worries :P
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #202 on: May 26, 2011, 08:42:33 pm »
Yeah it was more optimized, but I don't think it worked. It would've screwed up normal 16/16 division because of how I reordered the initialization in p_Div to destroy hl before loading hl into ac.
« Last Edit: May 26, 2011, 08:43:33 pm by Runer112 »

Offline thepenguin77

  • z80 Assembly Master
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1594
  • Rating: +823/-5
  • The game in my avatar is bit.ly/p0zPWu
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #203 on: June 10, 2011, 03:14:11 pm »
This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI.

(Thinking that interrupts were enabled by default caused runer quite a headache over IRC ;))
« Last Edit: June 10, 2011, 03:14:54 pm by thepenguin77 »
zStart v1.3.013 9-20-2013 
All of my utilities
TI-Connect Help
You can build a statue out of either 1'x1' blocks or 12'x12' blocks. The 1'x1' blocks will take a lot longer, but the final product is worth it.
       -Runer112

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #204 on: June 10, 2011, 11:20:26 pm »
More stuff regarding interrupts. SMC'ing the active port 6 page into the interrupt handler is, as far as I know, only necessary for applications. You could get rid of this if the code is being compiled to a program to save 9 bytes.



And on the topic of stuff that involves port 6, I think it would be nice if the archive byte reading routine avoided using a B_CALL for a massive speed boost, especially for code compiled as programs:

p_ReadArc: 18 bytes (2x) larger, but ~1400 cycles (!!!10x!!!) faster

Code: (36 bytes, ~142 cycles) [Select]
p_ReadArc:
.db __ReadArcEnd-1-$
ld c,a
in a,(6)
ld b,a
ld a,h
set 6,h
res 7,h
rlca
rlca
dec a
and %00000011
add a,c
out (6),a
ld c,(hl)
inc hl
bit 7,h
jr z,__ReadArcNoBoundary
set 6,h
res 7,h
inc a
out (6),a
__ReadArcNoBoundary:
ld l,(hl)
ld h,c
ld a,b
out (6),a
ret
__ReadArcEnd:

p_ReadArcApp: 36 bytes (3x) larger, but ~1050 cycles (4x) faster

Code: (54 bytes, ~396 cycles) [Select]
p_ReadArcApp:
.db __ReadArcAppEnd-1-$
push hl
ld hl,$0000
ld de,ramCode
ld bc,__ReadArcAppRamCodeEnd-__ReadArcAppRamCode
ldir
pop hl
ld e,a
ld c,6
in b,(c)
ld a,h
set 6,h
res 7,h
rlca
rlca
dec a
and %00000011
add a,e
call ramCode
ld e,d
inc hl
bit 7,h
jr z,__ReadArcAppNoBoundary
set 6,h
res 7,h
inc a
__ReadArcAppNoBoundary:
call ramCode
ex de,hl
ret
__ReadArcAppEnd:
.db rp_Ans,__ReadArcAppEnd-p_ReadArcApp-3

__ReadArcAppRamCode:
out (6),a
ld d,(hl)
out (c),b
ret
__ReadArcAppRamCodeEnd:
« Last Edit: June 11, 2011, 12:51:24 am by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #205 on: June 11, 2011, 01:42:59 am »
Quote
This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI.

They are disabled automatically already... there is a di at the start of the interrupt routine.  Is there some bug with that?

Also, about those archive reading commands... archive reading isn't as useful as it should be due to those sector boundary issues.  For instance, you can't reliably iterate a tilemap in archive because there is a small chance it could overlap between a sector boundary and iterating over it would add a "glitch byte" to the map since each sector adds an extra byte in front.  Although I guess you could modify those routines to take that into account, that might work since you can't read more than 64 consecutive kilobytes anyway.
« Last Edit: June 11, 2011, 01:43:38 am by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #206 on: June 11, 2011, 01:51:59 am »
Quote
This is a really simple one. When an interrupt is called, interrupts are automatically disabled. So you don't need to start the interrupt routine with DI.

They are disabled automatically already... there is a di at the start of the interrupt routine.  Is there some bug with that?

Also, about those archive reading commands... archive reading isn't as useful as it should be due to those sector boundary issues.  For instance, you can't reliably iterate a tilemap in archive because there is a small chance it could overlap between a sector boundary and iterating over it would add a "glitch byte" to the map since each sector adds an extra byte in front.  Although I guess you could modify those routines to take that into account, that might work since you can't read more than 64 consecutive kilobytes anyway.
There's no chance of overlapping a sector boundary, but yeah you can overlap a page boundary. TI-OS doesn't allow variables to cross sector boundaries.

Edit: About the DI thing, he means that it's a waste of a byte and 4 cycles to DI when it has already been done by the hardware.
« Last Edit: June 11, 2011, 01:52:50 am by calc84maniac »
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #207 on: June 17, 2011, 03:49:26 am »
I made a one-byte optimization to p_SDiv:
Old:New:
Code: [Select]
p_SDiv:
.db __SDivEnd-1-$
ld a,h
xor d
push af
bit 7,h
jr z,$+8
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
bit 7,d
jr z,$+8
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a
call $3F00+sub_Div
pop af
add a,a
ret nc
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
__SDivEnd:
Code: [Select]
p_SDiv:
.db __SDivEnd-1-$
ld a,h
xor d
push af
bit 7,h
jr z,$+8
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
bit 7,d
jr z,$+8
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a
call $3F00+sub_Div
pop af
ret p
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret
__SDivEnd:

I'm also working on a fixed-point division routine (that hijacks the normal division routine), but I think I need to make sure it works before I post it :P

Edit:
Well, I've convinced myself now that it works. You'll need to add in the stuff to correctly format the routine since I don't fully understand how that works (especially calling in the middle of other routines)
Code: [Select]
p_88Div:
ld a,h
xor d
push af
bit 7,h
jr z,$+8
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
bit 7,d
jr z,$+8
xor a
sub e
ld e,a
sbc a,a
sub d
ld d,a
ld bc,$1000
ld a,l
ld l,h
ld h,c
call __DivLoop
pop af
ret p
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
ret

Overflow checking isn't handled, but I suppose that's normal. It might be nice to saturate the result, though.
« Last Edit: June 17, 2011, 04:18:06 am by calc84maniac »
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #208 on: June 17, 2011, 04:55:30 am »
Cool, thanks!  I was also able to do that same sign flag optimization to the 8.8 multiplication routine.  Any idea what might be a good token for fixed point division?  That's the main thing holding me back from adding it.  /* is the first thing that comes to mind but I think its confusing.  /// could also work but that's a lot to type...
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Deep Toaster

  • So much to do, so much time, so little motivation
  • Administrator
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 8217
  • Rating: +758/-15
    • View Profile
    • ClrHome
Re: Assembly Programmers - Help Axe Optimize!
« Reply #209 on: June 17, 2011, 12:26:31 pm »
Maybe /^?