0 Members and 4 Guests are viewing this topic.
Quote from: Runer112 on January 05, 2011, 11:11:30 pmBy the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.* happybobjr loves runner
By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.
Quote from: Runer112 on January 05, 2011, 06:42:20 pmFaster buffer inversion routine. 9951 cycles saved.It's over 9000!!!! What?!? 9000? Yea, I know... I had to...But seriously dude, all those optimizations are awesome!
Faster buffer inversion routine. 9951 cycles saved.
p_RotC: .db __RotCEnd-1-$ ex de,hl ld c,8__RotCLoop1: ld hl,vx_SptBuff+8 ld b,8 ld a,(de)__RotCLoop2: dec l rra rr (hl) djnz __RotCLoop2 inc de dec c jr nz,__RotCLoop1 ret__RotCEnd:p_RotCC: .db __RotCCEnd-1-$ ex de,hl ld c,8__RotCCLoop1: ld hl,vx_SptBuff+8 ld b,8 ld a,(de)__RotCCLoop2: dec l rla rl (hl) djnz __RotCCLoop2 inc de dec c jr nz,__RotCCLoop1 ret__RotCCEnd:
ld hl,(var) dec hl ld a,h or l jp nz,DS_End ;Code inside statement goes here ld hl,maxDS_End: ld (var),hl
;const->{expr};Evaluate expr hereld (hl),const;const->{expr}r;Evaluate expr hereld (hl),const & $FFinc hlld (hl),const >> 8;const->{expr}rr;Evaluate expr hereld (hl),const >> 8inc hlld (hl),const & $FF
;0->{expr}r or 0->{expr}rr;Evaluate expr herexor ald (hl),ainc hlld (hl),a
ld hl,(axv_X1t) ld de,($982E) or a sbc hl,de ret z add hl,de ld de,-6 add hl,de ld e,(hl) inc e xor a ld d,a sbc hl,de ld (axv_X1t),hl ret
ld hl,(axv_X1t) ld de,($982E) or a sbc hl,de ret z add hl,de ld de,-6 add hl,de ld a,(hl) cpl ld e,a add hl,de ld (axv_X1t),hl ret
ld hl,(axv_X1t) ld de,-6 add hl,de ld a,(hl) cpl ld e,a add hl,de ld de,($982E) or a sbc hl,de ret z add hl,de ld (axv_X1t),hl ret
ld ix,(axv_X1t) ld l,(ix-6) ld h,0
ld ix,(axv_X1t) ld l,(ix-5) ld h,0
ld ix,(axv_X1t) ld b,(ix-6)Ax6_Loop: ld a,(ix-7) ld (hl),a inc hl dec ix djnz Ax6_Loop ld (hl),b ret
ex de,hl ld hl,(axv_X1t) ld bc,-6 add hl,bc ld b,(hl) ex de,hlAx6_Loop: dec de ld a,(de) ld (hl),a inc hl djnz Ax6_Loop ld (hl),b ret
I see you've been reading up on my Commands documentation, eh squidgetx? Yeah, that's an interesting thing I discovered when speed testing the display commands. On calculators like mine with the old, "good" screen drivers, the screen driver delay seems to be pretty low and constant from calculator to calculator. DispGraph could run just as fast or faster than DispGraphr on these calculators. However, due to inconsistencies with the screen drivers in newer units, the routine may run too fast for the driver on some calculators, causing display problems, so Quigibo had to add a portion of code to pause the routine until the driver says it is ready. However, this pause itself adds some overhead time, making the routine slower.Quigibo, the DispGraphr routine doesn't have any throttling system in place, yet no problems have been reported with it on newer calculators. Could you just remove the throttling system from the DispGraph routine and add one or two time-wasting instructions to make each loop iteration take as many cycles as each DispGraphr loop iteration?EDIT: Hmm I don't know if Quigibo reads this thread and would see that, so I'm probably going to post that in a major thread he reads or send him a message about that.
Print()
The second paragraph is my suggested optimization. The 3-level grayscale routine doesn't have a throttling system, yet there have been no reports of display problems from anybody. Wouldn't this suggest that all the screen drivers can handle routines that have as much delay as this? The data copying loop in the 3-level grayscale routine takes 72 cycles per byte output, so could delays simply be added to the normal screen display routine to make its loop at least 72 cycles?
p_Exchange: .db 13 pop de ex (sp),hl pop bc ld a,(de) ldi dec hl ld (hl),a inc hl ld a,b or c jr nz,$-8
p_Exchange: .db 12 pop de ex (sp),hl pop bc__ExchangeLoop: ld a,(de) ldi dec hl ld (hl),a inc hl jp pe,__ExchangeLoop ;or is it po?
p_GetBit2: .db 7 ;7 bytes, 49 cycles xor a add hl,hl add hl,hl add hl,hl ld h,a rla ld l,ap_GetBit3: .db 8 ;8 bytes, 30/29 cycles bit 4,h ld hl,0 jr z,$+3 inc lp_GetBit4: .db 8 ;8 bytes, 30/29 cycles bit 3,h ld hl,0 jr z,$+3 inc lp_GetBit5: .db 8 ;8 bytes, 30/29 cycles bit 2,h ld hl,0 jr z,$+3 inc lp_GetBit10: .db 7 ;7 bytes, 49 cycles xor a add hl,hl add hl,hl ld h,a add hl,hl ld l,h ld h,ap_GetBit11: .db 8 ;8 bytes, 30/29 cycles bit 4,l ld hl,0 jr z,$+3 inc lp_GetBit12: .db 8 ;8 bytes, 30/29 cycles bit 3,l ld hl,0 jr z,$+3 inc lp_GetBit13: .db 8 ;8 bytes, 30/29 cycles bit 2,l ld hl,0 jr z,$+3 inc l
p_GetBit2: .db 7 ;7 bytes, 37 cycles ld a,h set 5,h cp h sbc hl,hl inc hlp_GetBit3: .db 7 ;7 bytes, 37 cycles ld a,h set 4,h cp h sbc hl,hl inc hlp_GetBit4: .db 7 ;7 bytes, 37 cycles ld a,h set 3,h cp h sbc hl,hl inc hlp_GetBit5: .db 7 ;7 bytes, 37 cycles ld a,h set 2,h cp h sbc hl,hl inc hlp_GetBit10: .db 7 ;7 bytes, 37 cycles ld a,l set 5,l cp l sbc hl,hl inc hlp_GetBit11: .db 7 ;7 bytes, 37 cycles ld a,l set 4,l cp l sbc hl,hl inc hlp_GetBit12: .db 7 ;7 bytes, 37 cycles ld a,l set 3,l cp l sbc hl,hl inc hlp_GetBit13: .db 7 ;7 bytes, 37 cycles ld a,l set 2,l cp l sbc hl,hl inc hl