This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Messages - Runer112
Pages: 1 ... 102 103 [104] 105 106 ... 153
1546
« on: January 06, 2011, 12:46:52 am »
It looks like any instance of getKey (the B_CALL) has a bit of a problem... the compiler doesn't seem to advance the parsing location upon reaching it, parsing it over and over again until the remainder of the free RAM is filled with copies of the getKey routine and you get a memory error. I would assume this has something to do with the addition of the variable-argument direct port input getKey.
EDIT: Huh, that's odd... I'm getting the same problem with Axe 0.4.6. Surely the problem hasn't existed for this long and gone without noticing?
EDIT 2: I think this problem must be somewhere on my side. But this error happens on both wabbitemu and my real calculator. I'm puzzled...
EDIT 3: Hmm I think I may have found the problem... it looks like the error only crops up when getKey is the very last token in the source. Can anybody else confirm this?
1547
« on: January 05, 2011, 11:11:30 pm »
However, these are the concerns I have: First, the sprite rotation commands, why did you move the ret to the middle of the routine? It looks like that's just going to add more cycles since a conditional jr takes the same amount of cycles as a regular jr anyway.
Yeah, I'm not really sure why I did that. Feel free to initialize c to 8 instead and decrease and check c at the end using a conditional jump instead. Next, is it really a safe assumption that all ROM pages are between $7F and $FF for all current models and potentially future models?
$01 and $7F are all ROM pages, and $80-$87 are all RAM pages (at least for the calculators that have all those RAM pages), so it would make sense that $80 and up is RAM. But feel free to leave this optimization out anyways, it only saves 4 cycles part of the time. And lastly, are you sure trying to modifying rom (unsuccessfully) has no potential side effects to things like flags and registers?
After a quick test, yes, rrd and rld affect a correctly even when hl points to a byte in ROM. EDIT: By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.
1548
« on: January 05, 2011, 07:16:59 pm »
I just sort of blindly copied the $8000-$FFFF routine, not taking into account the fact that $0000-7FFF was ROM. Good catch, I'll edit my post now. And it turns out that's actually smaller anyways.
1549
« on: January 05, 2011, 06:42:20 pm »
Wow that took a long time. But I hope the results will be worth it. Quigibo, get out your reading glasses. ( By the way, I haven't tested these myself, but the code looks solid. If you believe that any of these would not work or have any questions, tell me.) Smaller nibble retrieval routines. 1 byte saved for reading from RAM, 3 bytes saved for reading from ROM.Thanks to calc84maniac for reminding me that $0000-$7FFF is read-only! p_Nib1: .db __Nib1End-$-1 scf rr h rr l ld a,(hl) jr c,__Nib1Skip rrca rrca rrca rrca __Nib1Skip: and %00001111 ld l,a ld h,0 ret __Nib1End:
| | p_Nib1: .db __Nib1End-$-1 xor a scf rr h rr l ld b,(hl) __Nib1Loop: rrd ccf jr c,__Nib1Loop ld (hl),b ld l,a ld h,0 ret __Nib1End:
|
p_Nib2: .db __Nib2End-$-1 srl h rr l ld a,(hl) jr c,__Nib2Skip rrca rrca rrca rrca __Nib2Skip: and %00001111 ld l,a ld h,0 ret __Nib2End:
| | p_Nib2: .db __Nib2End-$-1 xor a srl h rr l rrd jr c,__Nib2Skip rld __Nib2Skip: ld l,a ld h,0 ret __Nib2End:
|
Smaller and faster nibble storage routine. 1 byte and ~17 cycles saved.p_NibSto: .db __NibStoEnd-$-1 pop bc pop de push bc scf rr h rr l ld b,(hl) ex de,hl ;hl = byte ;de = addr ld a,%11110000 jr c,__NibStoSkip add hl,hl add hl,hl add hl,hl add hl,hl cpl __NibStoSkip: and b or l ld (de),a ret __NibStoEnd:
| | p_NibSto: .db __NibStoEnd-$-1 pop bc pop de push bc scf rr h rr l jr c,__NibStoHigh rrd ld a,e rld ret __NibStoHigh: rld ld a,e rrd ret __NibStoEnd:
|
Faster buffer inversion routine. 9951 cycles saved.p_InvBuff: .db __InvBuffEnd-1-$ ld hl,plotSScreen ld bc,768 __InvBuffLoop: ld a,(hl) cpl ld (hl),a inc hl dec bc ld a,b or c jr nz,__InvBuffLoop ret __InvBuffEnd:
| | p_InvBuff: .db __InvBuffEnd-1-$ ld hl,plotSScreen ld bc,3 __InvBuffLoop: ld a,(hl) cpl ld (hl),a inc hl djnz __InvBuffLoop dec c jr nz,__InvBuffLoop ret __InvBuffEnd:
|
You'll laugh at this... but I managed to save 4 cycles in the unarchive and archive routines. And only if the targeted variable doesn't exist. But hey, why not take all the savings you can get.I think this works. It relies on the page number returned in b always being 0 if a RAM page and always being in the range or $01-$7F if a flash page. p_Unarchive: .db __UnarchiveEnd-1-$ MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 ret c inc b dec b ret z B_CALL(_Arc_Unarc) ld hl,1 ret __UnarchiveEnd:
| | p_Unarchive: .db __UnarchiveEnd-1-$ MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 ret c dec b ret m inc b B_CALL(_Arc_Unarc) ld hl,1 ret __UnarchiveEnd:
|
p_Archive: .db __ArchiveEnd-1-$ MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 ret c inc b dec b ret nz B_CALL(_Arc_Unarc) ld hl,1 ret __ArchiveEnd:
| | p_Archive: .db __ArchiveEnd-1-$ MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 ret c dec b ret p inc b B_CALL(_Arc_Unarc) ld hl,1 ret __ArchiveEnd:
|
Smaller archived variable locating. 4 bytes saved.p_GetArc: .db __GetArcEnd-1-$ push de MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 jr c,__GetArcFail ld a,(OP1) cp ListObj jr z,__GetArcName cp ProgObj jr z,__GetArcName cp AppvarObj jr z,__GetArcName cp GroupObj jr z,__GetArcName __GetArcStatic: ld hl,14 jr __GetArcDone __GetArcName: ld hl,9 add hl,de B_CALL(_LoadDEIndPaged) ld d,0 inc hl inc hl __GetArcDone: add hl,de __GetArcFail: ex de,hl pop hl ld (hl),e inc hl ld (hl),d inc hl ld (hl),b ex de,hl ret __GetArcEnd:
| | p_GetArc: .db __GetArcEnd-1-$ push de MOV9TOOP1() B_CALL(_ChkFindSym) ld hl,0 jr c,__GetArcFail and %00011111 ld d,b ld hl,__GetArcVarTypes ld bc,__GetArcEnd-__GetArcVarTypes cpir ld b,d ld hl,14 jr nz,__GetArcDone ld l,9 add hl,de B_CALL(_LoadDEIndPaged) ld d,0 inc e inc e __GetArcDone: add hl,de __GetArcFail: ex de,hl pop hl ld (hl),e inc hl ld (hl),d inc hl ld (hl),b ex de,hl ret __GetArcVarTypes: .db ListObj,ProgObj,AppvarObj,GroupObj __GetArcEnd:
|
Smaller 8-bit get bit routine. 1 byte saved.p_GetBit: .db 13 ld a,e and %00000111 inc a ld b,a ld a,l __GetBitLoop: add a,a djnz __GetBitLoop ld h,b ld l,b rl l
| | p_GetBit: .db 12 ld a,e and %00000111 inc a ld b,a xor a __GetBitLoop: ld h,a add hl,hl djnz __GetBitLoop ld l,h ld h,a
|
As long as the low byte of vx_SptBuff is at most $F8: faster sprite flipping routines. 16 cycles saved each.p_FlipV: .db __FlipVEnd-1-$ ex de,hl ld hl,vx_SptBuff+8 ld b,8 __FlipVLoop: dec hl ld a,(de) ld (hl),a inc de djnz __FlipVLoop ret __FlipVEnd:
| | p_FlipV: .db __FlipVEnd-1-$ ex de,hl ld hl,vx_SptBuff+8 ld b,8 __FlipVLoop: dec l ld a,(de) ld (hl),a inc de djnz __FlipVLoop ret __FlipVEnd:
|
p_FlipH: .db __FlipHEnd-1-$ ld de,vx_SptBuff push de ld b,8 __FlipHLoop1: ld c,(hl) ld a,1 __FlipHLoop2: rr c rla jr nc,__FlipHLoop2 ld (de),a inc hl inc de djnz __FlipHLoop1 pop hl ret __FlipHEnd:
| | p_FlipH: .db __FlipHEnd-1-$ ld de,vx_SptBuff push de ld b,8 __FlipHLoop1: ld c,(hl) ld a,1 __FlipHLoop2: rr c rla jr nc,__FlipHLoop2 ld (de),a inc hl inc e djnz __FlipHLoop1 pop hl ret __FlipHEnd:
|
Smaller and faster sprite rotating routines. 2 bytes smaller and 166 cycles faster. These also save 16 cycles from relying on the low byte of vx_SptBuff being at most $F8.p_RotC: .db __RotCEnd-1-$ ex de,hl ld hl,vx_SptBuff ld c,8 __RotCLoop1: push hl ld b,8 ld a,(de) __RotCLoop2: rla rr (hl) inc hl djnz __RotCLoop2 pop hl inc de dec c jr nz,__RotCLoop1 ret __RotCEnd:
| | p_RotC: .db __RotCEnd-1-$ ex de,hl ld c,8+1 __RotCLoop1: ld hl,vx_SptBuff dec c ret z ld b,8 ld a,(de) __RotCLoop2: rla rr (hl) inc l djnz __RotCLoop2 inc de jr __RotCLoop1 __RotCEnd:
|
p_RotCC: .db __RotCCEnd-1-$ ex de,hl ld hl,vx_SptBuff ld c,8 __RotCCLoop1: push hl ld b,8 ld a,(de) __RotCCLoop2: rra rl (hl) inc hl djnz __RotCCLoop2 pop hl inc de dec c jr nz,__RotCCLoop1 ret __RotCCEnd:
| | p_RotCC: .db __RotCCEnd-1-$ ex de,hl ld c,8+1 __RotCCLoop1: ld hl,vx_SptBuff dec c ret z ld b,8 ld a,(de) __RotCCLoop2: rra rl (hl) inc l djnz __RotCCLoop2 inc de jr __RotCCLoop1 __RotCCEnd:
|
That's all I have for now. I think I got just about everything I could possibly find, but I might have some more later. And if you want all the routines in one file, I uploaded them all here.
1550
« on: January 04, 2011, 05:34:52 pm »
Alright, these aren't that urgent anyways. Just trying to squeeze every last byte and cycle out of Axe programs.
1551
« on: January 04, 2011, 05:29:54 pm »
God.
1552
« on: January 04, 2011, 12:11:05 pm »
It doesn't look like the optimized p_EQN2 and p_EQN1 routines that I suggested made it into Axe 0.4.7. Did you maybe forget to add them? Or were my routines incorrect? Also, I think you might have overlooked my footnote in the first equality optimization post mentioning that p_Div32768 can be optimized to be the same as p_SLT0 and p_GetBit0. EDIT: Also, it doesn't look like p_EQNX and p_NENX are being used by the parser. Is this just an accident, or did you intentionally leave them out for now due to the problem of the constant in the source code not equaling the constant that should be inserted? And along the lines of comparing negative shorts, if you get p_NENX working, it should be the same size and faster for -3 than p_NEN3, so the latter should be removed. And for p_NEN2, can't the first instruction just be inc l instead of inc hl? EDIT 2: I think I've mentioned this in the past as well, but it looks like you must've missed seeing it/adding it. p_GetBit15 could be optimized to be the same as p_Mod2. EDIT 3: (Man this is going to be one long post) It was smart of you to make the greater than, greater or equal, less than, and less or equal comparisons with second arguments that are expressions call the opposite routine after popping the first argument into de, thus avoiding any double ex de,hl's. Could something like this also be used for the greater than and less or equal comparisons with the second argument being a variable? A byte could be saved by making these insances use ex de,hl / ld hl,($0000) to load the variable instead of ld de,($0000) and then call the opposite routine. EDIT 4: Going along the lines of edit 3, p_SIntGt and p_SIntLe could be optimized for variable arguments in the same way. Although no bytes would be saved calling p_SIntLt instead of p_SIntGt, cycles would be.
1553
« on: January 03, 2011, 11:07:14 pm »
Science lesson! YAY ( By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.) All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers ( a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l / ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2! *128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7 / add hl,hl / djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.
Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size! EDIT: Wow, it took me a whole hour to write this? Major ninja'd.
1554
« on: January 03, 2011, 01:11:46 pm »
+1 is smaller and faster than 1
1555
« on: January 03, 2011, 01:08:26 pm »
I opted out of that optimization because I assumed that if somebody wanted to use this code in their program, they would probably have more than just that one sprite in the buffer and wouldn't want to clear it every time they updated the screen.
1556
« on: January 01, 2011, 06:47:51 pm »
As small as I could get it, 345 bytes of executable code. :.SMILE :[004466000000817E]->Pic1 :DiagnosticOff :0->X->Y :Repeat getkey(15) :ClrDraw :getKey(3)-getKey(2)+X :!If +1 :+1 :End :-1 :Pt-On(min(,88)→X,getKey(1)-getKey(4)+Y+(=⁻1)min(,56)→Y,Pic1) :DispGraph :End
1557
« on: December 31, 2010, 05:34:04 pm »
Speaking of optimizing control structures, any chance for Do...While loops?
I'd like Switch cases more than Do...While loops, but I don't really know how the second works.
Do...While loops are a lot like While...End loops, except that the check to advance to another iteration of the loop is at the end instead of the beginning. This turns out to be more optimized. Whereas a While...End loop has two jumps, one to exit the loop if the condition is false and one to jump back to the beginning, a Do...While combines the two into one jump that jumps back to the beginning if the condition is true. I don't know how good that explanation was, but perhaps this pseudocode will help clarify it. Numbers in brackets indicate the number of bytes the real code takes: While...End loop | | Do...While loop | ;While [x] Loop condition [2] Check if loop condition is true or false [3] Exit the loop if false
;Loop contents
;End [3] Jump back to the beginning of the loop
| | ;Do ;[0] Start of the loop, takes no actual code
;Loop contents
;While [x] Loop condition [2] Check if loop condition is true or false [3] Jump back to the beginning of the loop if true
|
1558
« on: December 31, 2010, 05:20:38 pm »
Speaking of optimizing control structures, any chance for Do...While loops?
1559
« on: December 30, 2010, 12:01:33 am »
Both It's 8 bytes smaller and should save about 52 t-states per iteration.
1560
« on: December 29, 2010, 11:46:14 pm »
In case anyone was wondering, here is the smallest loop structure that I know of in Axe. It will execute the loop n times, with A starting at n-1 and decreasing down to 0:
n While -1→A ;Code A End And if Quigibo adds conditional jumps, this sort of Do...While loop would be even smaller (which he has not yet, so don't bother with this code as of Axe 0.4.7):
n Lbl L →A ;Code !If A-1 Goto L End
EDIT: As a side note in case you read this Quigibo, any chance for Do...While loops? They're more optimized than normal While loops.
Pages: 1 ... 102 103 [104] 105 106 ... 153
|