Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Runer112

Pages: 1 ... 102 103 [104] 105 106 ... 153
1546
The Axe Parser Project / Re: Bug Reports
« on: January 06, 2011, 12:46:52 am »
It looks like any instance of getKey (the B_CALL) has a bit of a problem... the compiler doesn't seem to advance the parsing location upon reaching it, parsing it over and over again until the remainder of the free RAM is filled with copies of the getKey routine and you get a memory error. I would assume this has something to do with the addition of the variable-argument direct port input getKey.

EDIT: Huh, that's odd... I'm getting the same problem with Axe 0.4.6. Surely the problem hasn't existed for this long and gone without noticing?

EDIT 2: I think this problem must be somewhere on my side. But this error happens on both wabbitemu and my real calculator. I'm puzzled...

EDIT 3: Hmm I think I may have found the problem... it looks like the error only crops up when getKey is the very last token in the source. Can anybody else confirm this?

1547
The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!
« on: January 05, 2011, 11:11:30 pm »
However, these are the concerns I have:  First, the sprite rotation commands, why did you move the ret to the middle of the routine?  It looks like that's just going to add more cycles since a conditional jr takes the same amount of cycles as a regular jr anyway.

Yeah, I'm not really sure why I did that. Feel free to initialize c to 8 instead and decrease and check c at the end using a conditional jump instead.

Next, is it really a safe assumption that all ROM pages are between $7F and $FF for all current models and potentially future models?

$01 and $7F are all ROM pages, and $80-$87 are all RAM pages (at least for the calculators that have all those RAM pages), so it would make sense that $80 and up is RAM. But feel free to leave this optimization out anyways, it only saves 4 cycles part of the time.

And lastly, are you sure trying to modifying rom (unsuccessfully) has no potential side effects to things like flags and registers?

After a quick test, yes, rrd and rld affect a correctly even when hl points to a byte in ROM.



EDIT: By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.

1548
The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!
« on: January 05, 2011, 07:16:59 pm »
I just sort of blindly copied the $8000-$FFFF routine, not taking into account the fact that $0000-7FFF was ROM. Good catch, I'll edit my post now. And it turns out that's actually smaller anyways. ;)

1549
The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!
« on: January 05, 2011, 06:42:20 pm »
Wow that took a long time. But I hope the results will be worth it.
Quigibo, get out your reading glasses. ;)

(By the way, I haven't tested these myself, but the code looks solid. If you believe that any of these would not work or have any questions, tell me.)



Smaller nibble retrieval routines. 1 byte saved for reading from RAM, 3 bytes saved for reading from ROM.

Thanks to calc84maniac for reminding me that $0000-$7FFF is read-only!

Code: (Original routine: 18 bytes, ~72 cycles) [Select]
p_Nib1:
.db __Nib1End-$-1
scf
rr h
rr l
ld a,(hl)
jr c,__Nib1Skip
rrca
rrca
rrca
rrca
__Nib1Skip:
and %00001111
ld l,a
ld h,0
ret
__Nib1End:
   
Code: (Optimized routine: 17 bytes, ~105 cycles) [Select]
p_Nib1:
.db __Nib1End-$-1
xor a
scf
rr h
rr l
ld b,(hl)
__Nib1Loop:
rrd
ccf
jr c,__Nib1Loop
ld (hl),b
ld l,a
ld h,0
ret
__Nib1End:

Code: (Original routine: 18 bytes, ~68 cycles) [Select]
p_Nib2:
.db __Nib2End-$-1
srl h
rr l
ld a,(hl)
jr c,__Nib2Skip
rrca
rrca
rrca
rrca
__Nib2Skip:
and %00001111
ld l,a
ld h,0
ret
__Nib2End:
   
Code: (Optimized routine: 15 bytes, ~77 cycles) [Select]
p_Nib2:
.db __Nib2End-$-1
xor a
srl h
rr l
rrd
jr c,__Nib2Skip
rld
__Nib2Skip:
ld l,a
ld h,0
ret
__Nib2End:



Smaller and faster nibble storage routine. 1 byte and ~17 cycles saved.

Code: (Original routine: 23 bytes, ~127 cycles) [Select]
p_NibSto:
.db __NibStoEnd-$-1
pop bc
pop de
push bc
scf
rr h
rr l
ld b,(hl)
ex de,hl ;hl = byte ;de = addr
ld a,%11110000
jr c,__NibStoSkip
add hl,hl
add hl,hl
add hl,hl
add hl,hl
cpl
__NibStoSkip:
and b
or l
ld (de),a
ret
__NibStoEnd:
   
Code: (Optimized routine: 22 bytes, ~110 cycles) [Select]
p_NibSto:
.db __NibStoEnd-$-1
pop bc
pop de
push bc
scf
rr h
rr l
jr c,__NibStoHigh
rrd
ld a,e
rld
ret
__NibStoHigh:
rld
ld a,e
rrd
ret
__NibStoEnd:



Faster buffer inversion routine. 9951 cycles saved.

Code: (Original routine: 16 bytes, 38425 cycles) [Select]
p_InvBuff:
.db __InvBuffEnd-1-$
ld hl,plotSScreen
ld bc,768
__InvBuffLoop:
ld a,(hl)
cpl
ld (hl),a
inc hl
dec bc
ld a,b
or c
jr nz,__InvBuffLoop
ret
__InvBuffEnd:
   
Code: (Optimized routine: 16 bytes, 28474 cycles) [Select]
p_InvBuff:
.db __InvBuffEnd-1-$
ld hl,plotSScreen
ld bc,3
__InvBuffLoop:
ld a,(hl)
cpl
ld (hl),a
inc hl
djnz __InvBuffLoop
dec c
jr nz,__InvBuffLoop
ret
__InvBuffEnd:



You'll laugh at this... but I managed to save 4 cycles in the unarchive and archive routines. And only if the targeted variable doesn't exist. But hey, why not take all the savings you can get.

I think this works. It relies on the page number returned in b always being 0 if a RAM page and always being in the range or $01-$7F if a flash page.

Code: (Original routine: 18 bytes, a lot of cycles) [Select]
p_Unarchive:
.db __UnarchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
inc b
dec b
ret z
B_CALL(_Arc_Unarc)
ld hl,1
ret
__UnarchiveEnd:
   
Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]
p_Unarchive:
.db __UnarchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
dec b
ret m
inc b
B_CALL(_Arc_Unarc)
ld hl,1
ret
__UnarchiveEnd:

Code: (Original routine: 18 bytes, a lot of cycles) [Select]
p_Archive:
.db __ArchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
inc b
dec b
ret nz
B_CALL(_Arc_Unarc)
ld hl,1
ret
__ArchiveEnd:
   
Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]
p_Archive:
.db __ArchiveEnd-1-$
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
ret c
dec b
ret p
inc b
B_CALL(_Arc_Unarc)
ld hl,1
ret
__ArchiveEnd:



Smaller archived variable locating. 4 bytes saved.

Code: (Original routine: 55 bytes, a lot of cycles) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
jr c,__GetArcFail
ld a,(OP1)
cp ListObj
jr z,__GetArcName
cp ProgObj
jr z,__GetArcName
cp AppvarObj
jr z,__GetArcName
cp GroupObj
jr z,__GetArcName
__GetArcStatic:
ld hl,14
jr __GetArcDone
__GetArcName:
ld hl,9
add hl,de
B_CALL(_LoadDEIndPaged)
ld d,0
inc hl
inc hl
__GetArcDone:
add hl,de
__GetArcFail:
ex de,hl
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcEnd:
   
Code: (Optimized routine: 51 bytes, a lot of cycles) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
ld hl,0
jr c,__GetArcFail
and %00011111
ld d,b
ld hl,__GetArcVarTypes
ld bc,__GetArcEnd-__GetArcVarTypes
cpir
ld b,d
ld hl,14
jr nz,__GetArcDone
ld l,9
add hl,de
B_CALL(_LoadDEIndPaged)
ld d,0
inc e
inc e
__GetArcDone:
add hl,de
__GetArcFail:
ex de,hl
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcVarTypes:
.db ListObj,ProgObj,AppvarObj,GroupObj
__GetArcEnd:



Smaller 8-bit get bit routine. 1 byte saved.

Code: (Original routine: 13 bytes, ~110 cycles) [Select]
p_GetBit:
.db 13
ld a,e
and %00000111
inc a
ld b,a
ld a,l
__GetBitLoop:
add a,a
djnz __GetBitLoop
ld h,b
ld l,b
rl l
   
Code: (Optimized routine: 12 bytes, ~152 cycles) [Select]
p_GetBit:
.db 12
ld a,e
and %00000111
inc a
ld b,a
xor a
__GetBitLoop:
ld h,a
add hl,hl
djnz __GetBitLoop
ld l,h
ld h,a



As long as the low byte of vx_SptBuff is at most $F8: faster sprite flipping routines. 16 cycles saved each.

Code: (Original routine: 13 bytes, 338 cycles) [Select]
p_FlipV:
.db __FlipVEnd-1-$
ex de,hl
ld hl,vx_SptBuff+8
ld b,8
__FlipVLoop:
dec hl
ld a,(de)
ld (hl),a
inc de
djnz __FlipVLoop
ret
__FlipVEnd:
   
Code: (Optimized routine: 13 bytes, 322 cycles) [Select]
p_FlipV:
.db __FlipVEnd-1-$
ex de,hl
ld hl,vx_SptBuff+8
ld b,8
__FlipVLoop:
dec l
ld a,(de)
ld (hl),a
inc de
djnz __FlipVLoop
ret
__FlipVEnd:

Code: (Original routine: 21 bytes, 1907 cycles) [Select]
p_FlipH:
.db __FlipHEnd-1-$
ld de,vx_SptBuff
push de
ld b,8
__FlipHLoop1:
ld c,(hl)
ld a,1
__FlipHLoop2:
rr c
rla
jr nc,__FlipHLoop2
ld (de),a
inc hl
inc de
djnz __FlipHLoop1
pop hl
ret
__FlipHEnd:
   
Code: (Optimized routine: 21 bytes, 1891 cycles) [Select]
p_FlipH:
.db __FlipHEnd-1-$
ld de,vx_SptBuff
push de
ld b,8
__FlipHLoop1:
ld c,(hl)
ld a,1
__FlipHLoop2:
rr c
rla
jr nc,__FlipHLoop2
ld (de),a
inc hl
inc e
djnz __FlipHLoop1
pop hl
ret
__FlipHEnd:



Smaller and faster sprite rotating routines. 2 bytes smaller and 166 cycles faster. These also save 16 cycles from relying on the low byte of vx_SptBuff being at most $F8.

Code: (Original routine: 22 bytes, 2874 cycles) [Select]
p_RotC:
.db __RotCEnd-1-$
ex de,hl
ld hl,vx_SptBuff
ld c,8
__RotCLoop1:
push hl
ld b,8
ld a,(de)
__RotCLoop2:
rla
rr (hl)
inc hl
djnz __RotCLoop2
pop hl
inc de
dec c
jr nz,__RotCLoop1
ret
__RotCEnd:
   
Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]
p_RotC:
.db __RotCEnd-1-$
ex de,hl
ld c,8+1
__RotCLoop1:
ld hl,vx_SptBuff
dec c
ret z
ld b,8
ld a,(de)
__RotCLoop2:
rla
rr (hl)
inc l
djnz __RotCLoop2
inc de
jr __RotCLoop1
__RotCEnd:

Code: (Original routine: 22 bytes, 2874 cycles) [Select]
p_RotCC:
.db __RotCCEnd-1-$
ex de,hl
ld hl,vx_SptBuff
ld c,8
__RotCCLoop1:
push hl
ld b,8
ld a,(de)
__RotCCLoop2:
rra
rl (hl)
inc hl
djnz __RotCCLoop2
pop hl
inc de
dec c
jr nz,__RotCCLoop1
ret
__RotCCEnd:
   
Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]
p_RotCC:
.db __RotCCEnd-1-$
ex de,hl
ld c,8+1
__RotCCLoop1:
ld hl,vx_SptBuff
dec c
ret z
ld b,8
ld a,(de)
__RotCCLoop2:
rra
rl (hl)
inc l
djnz __RotCCLoop2
inc de
jr __RotCCLoop1
__RotCCEnd:



That's all I have for now. I think I got just about everything I could possibly find, but I might have some more later. And if you want all the routines in one file, I uploaded them all here.

1550
The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!
« on: January 04, 2011, 05:34:52 pm »
Alright, these aren't that urgent anyways. Just trying to squeeze every last byte and cycle out of Axe programs. ;)

1551
Miscellaneous / Re: What is your avatar?
« on: January 04, 2011, 05:29:54 pm »
God.

1552
The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!
« on: January 04, 2011, 12:11:05 pm »
It doesn't look like the optimized p_EQN2 and p_EQN1 routines that I suggested made it into Axe 0.4.7. Did you maybe forget to add them? Or were my routines incorrect?

Also, I think you might have overlooked my footnote in the first equality optimization post mentioning that p_Div32768 can be optimized to be the same as p_SLT0 and p_GetBit0. :P



EDIT: Also, it doesn't look like p_EQNX and p_NENX are being used by the parser. Is this just an accident, or did you intentionally leave them out for now due to the problem of the constant in the source code not equaling the constant that should be inserted?

And along the lines of comparing negative shorts, if you get p_NENX working, it should be the same size and faster for -3 than p_NEN3, so the latter should be removed. And for p_NEN2, can't the first instruction just be inc l instead of inc hl?


EDIT 2: I think I've mentioned this in the past as well, but it looks like you must've missed seeing it/adding it. p_GetBit15 could be optimized to be the same as p_Mod2.


EDIT 3: (Man this is going to be one long post) It was smart of you to make the greater than, greater or equal, less than, and less or equal comparisons with second arguments that are expressions call the opposite routine after popping the first argument into de, thus avoiding any double ex de,hl's. Could something like this also be used for the greater than and less or equal comparisons with the second argument being a variable? A byte could be saved by making these insances use ex de,hl / ld hl,($0000) to load the variable instead of ld de,($0000) and then call the opposite routine.

EDIT 4: Going along the lines of edit 3, p_SIntGt and p_SIntLe could be optimized for variable arguments in the same way. Although no bytes would be saved calling p_SIntLt instead of p_SIntGt, cycles would be.

1553
Axe / Re: The Optimization Compilation
« on: January 03, 2011, 11:07:14 pm »
Science lesson! YAY ;D (By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l  /  ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7  /  add hl,hl  /  djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.



Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size! ;)



EDIT: Wow, it took me a whole hour to write this? Major ninja'd.

1554
Axe / Re: Moving a Sprite
« on: January 03, 2011, 01:11:46 pm »
+1 is smaller and faster than 1 :P

1555
Axe / Re: Moving a Sprite
« on: January 03, 2011, 01:08:26 pm »
I opted out of that optimization because I assumed that if somebody wanted to use this code in their program, they would probably have more than just that one sprite in the buffer and wouldn't want to clear it every time they updated the screen.

1556
Axe / Re: Moving a Sprite
« on: January 01, 2011, 06:47:51 pm »
As small as I could get it, 345 bytes of executable code. ;)

Code: [Select]
:.SMILE
:[004466000000817E]->Pic1
:DiagnosticOff
:0->X->Y
:Repeat getkey(15)
:ClrDraw
:getKey(3)-getKey(2)+X
:!If +1
:+1
:End
:-1
:Pt-On(min(,88)→X,getKey(1)-getKey(4)+Y+(=⁻1)min(,56)→Y,Pic1)
:DispGraph
:End

1557
The Axe Parser Project / Re: Features Wishlist
« on: December 31, 2010, 05:34:04 pm »
Speaking of optimizing control structures, any chance for Do...While loops?

I'd like Switch cases more than Do...While loops, but I don't really know how the second works.


Do...While loops are a lot like While...End loops, except that the check to advance to another iteration of the loop is at the end instead of the beginning. This turns out to be more optimized. Whereas a While...End loop has two jumps, one to exit the loop if the condition is false and one to jump back to the beginning, a Do...While combines the two into one jump that jumps back to the beginning if the condition is true.

I don't know how good that explanation was, but perhaps this pseudocode will help clarify it. Numbers in brackets indicate the number of bytes the real code takes:

While...End loop                                                                          Do...While loop                                                                              
Code: [Select]
;While
[x] Loop condition
[2] Check if loop condition is true or false
[3] Exit the loop if false

;Loop contents

;End
[3] Jump back to the beginning of the loop


       
Code: [Select]
;Do
;[0] Start of the loop, takes no actual code



;Loop contents

;While
[x] Loop condition
[2] Check if loop condition is true or false
[3] Jump back to the beginning of the loop if true

1558
The Axe Parser Project / Re: Features Wishlist
« on: December 31, 2010, 05:20:38 pm »
Speaking of optimizing control structures, any chance for Do...While loops?

1559
Axe / Re: The Optimization Compilation
« on: December 30, 2010, 12:01:33 am »
Both ;) It's 8 bytes smaller and should save about 52 t-states per iteration.

1560
Axe / Re: The Optimization Compilation
« on: December 29, 2010, 11:46:14 pm »
In case anyone was wondering, here is the smallest loop structure that I know of in Axe. It will execute the loop n times, with A starting at n-1 and decreasing down to 0:

Code: [Select]
n
While
  -1→A
  ;Code
  A
End

And if Quigibo adds conditional jumps, this sort of Do...While loop would be even smaller (which he has not yet, so don't bother with this code as of Axe 0.4.7):

Code: [Select]
n
Lbl L
  →A
  ;Code
!If A-1
  Goto L
End


EDIT: As a side note in case you read this Quigibo, any chance for Do...While loops? They're more optimized than normal While loops.

Pages: 1 ... 102 103 [104] 105 106 ... 153