Author Topic: Assembly Programmers - Help Axe Optimize! (Read 169111 times)

Happybobjr · « **Reply #240 on:** August 30, 2011, 06:23:00 am »

what does that code do though

Runer112 · « **Reply #241 on:** September 18, 2011, 03:15:23 am »

At this rate, I'll have optimized just about every Axe routine eventually!

p_ToHex: 31 cycles faster.

Code: (Old code: 25 bytes, 670 cycles) [Select]

p_ToHex:
	.db __ToHexEnd-$-1
	ld	b,4
	ld	de,vx_SptBuff
	push	de
__ToHexLoop:
	ld	a,$1F
__ToHexShift:
	add	hl,hl
	rla
	jr	nc,__ToHexShift
	daa
	add	a,$A0
	adc	a,$40
	ld	(de),a
	inc	de
	djnz	__ToHexLoop
	xor	a
	ld	(de),a
	pop	hl
	ret
__ToHexEnd:

Code: (New code: 25 bytes, 639 cycles) [Select]

p_ToHex:
	.db __ToHexEnd-$-1
	ld	bc,4<<8+$1F
	ld	de,vx_SptBuff
__ToHexLoop:
	ld	a,c
__ToHexShift:
	add	hl,hl
	rla
	jr	nc,__ToHexShift
	daa
	add	a,$A0
	adc	a,$40
	ld	(de),a
	inc	e
	djnz	__ToHexLoop
	ex	de,hl
	ld	(hl),b
	ld	l,vx_SptBuff&$FF
	ret
__ToHexEnd:

p_ShiftLeft: 1 byte smaller, 67 cycles faster. You could save an additional 384 cycles by giving up the minor size savings and loading 12<<8+4 into de at the start of the routine and then replacing the immediate data operands in the loop with d and e.

Code: (Old code: 17 bytes, 27542 cycles) [Select]

p_ShiftLeft:
	.db __ShiftLeftEnd-1-$
	ld	hl,plotSScreen+767
	ld	c,64
__ShiftLeftLoop:
	ld	b,12
	or	a
__ShiftLeftShift:
	rl	(hl)
	dec	hl
	djnz	__ShiftLeftShift
	dec	c
	jr	nz,__ShiftLeftLoop
	ret
__ShiftLeftEnd:

Code: (New code: 16 bytes, 27475 cycles) [Select]

p_ShiftLeft:
	.db __ShiftLeftEnd-1-$
	ld	hl,plotSScreen+767
	xor	a
__ShiftLeftLoop:
	ld	b,12
__ShiftLeftShift:
	rl	(hl)
	dec	hl
	djnz	__ShiftLeftShift
	add	a,4
	jr	nz,__ShiftLeftLoop
	ret
__ShiftLeftEnd:

p_ShiftRight: 1 byte smaller, 67 cycles faster. Same deal as p_ShiftLeft.

Code: (Old code: 17 bytes, 27542 cycles) [Select]

p_ShiftRight:
	.db __ShiftRightEnd-1-$
	ld	hl,plotSScreen
	ld	c,64
__ShiftRightLoop:
	ld	b,12
	or	a
__ShiftRightShift:
	rr	(hl)
	inc	hl
	djnz	__ShiftRightShift
	dec	c
	jr	nz,__ShiftRightLoop
	ret
__ShiftRightEnd:

Code: (New code: 16 bytes, 27475 cycles) [Select]

p_ShiftRight:
	.db __ShiftRightEnd-1-$
	ld	hl,plotSScreen
	xor	a
__ShiftRightLoop:
	ld	b,12
__ShiftRightShift:
	rr	(hl)
	inc	hl
	djnz	__ShiftRightShift
	add	a,4
	jr	nz,__ShiftRightLoop
	ret
__ShiftRightEnd:

p_FreqOut: 1 byte smaller. Takes advantage of an absolute jump. This is a strange routine to optimize, because optimizing it results in it running about 15% faster which would result in slightly higher pitched and shorter notes. Although this command is rarely used, this augmentation might still make the optimization not worth it. Whether or not you include the optimization, it might be a good idea to change this routine to use p_Safety.

Code: (Old code: 23 bytes) [Select]

p_FreqOut:
	.db __FreqOutEnd-1-$
	xor	a
__FreqOutLoop1:
	push	bc
	ld	e,a
__FreqOutLoop2:
	ld	a,h
	or	l
	jr	z,__FreqOutDone
	dec	hl
	dec	bc
	ld	a,b
	or	c
	jr	nz,__FreqOutLoop2
	ld	a,e
	xor	%00000011
	scf
__FreqOutDone:
	pop	bc
	out	($00),a
	ret	nc
	jr	__FreqOutLoop1
__FreqOutEnd:

Code: (New code: 22 bytes) [Select]

p_FreqOut:
	.db __FreqOutEnd-1-$
	xor	a
__FreqOutLoop1:
	push	bc
	ld	e,a
__FreqOutLoop2:
	ld	a,h
	or	l
	jr	z,__FreqOutDone
	cpd
	jp	pe,__FreqOutLoop2
	ld	a,e
	xor	%00000011
	scf
__FreqOutDone:
	pop	bc
	out	($00),a
	ret	nc
	jr	__FreqOutLoop1
__FreqOutEnd:

p_IntSetup: 4 bytes smaller. I thought this was some pretty impressive work.

And regarding interrupts, I still think the port 6 saving and restoring shenanigans aren't necessary for programs. The only reason port 6 would need to be restored to the value it held when interrupts were enabled is if the user is using a shell application in conjugation with their Axe program. In that case, either the designer of the shell application interface system could provide modified interrupt routines in an Axiom, or the user is probably intelligent enough to be able to provide their own interrupt routines. (Actually it wouldn't even need to be their own, they could just copy the one for applications from the Commands.inc file)

Code: (Old code: 42 bytes, a lot of cycles) [Select]

p_IntSetup:
	.db __IntEnd-p_IntSetup-1
	di
	ld	de,$8B01
	ld	a,d
	ld	i,a
	ld	a,l
	ld	hl,$8B00
	ld	b,e
	ld	c,l
	ld	(hl),$8A
	ldir

	and	%00000110
	out	(4),a
	ld	a,%00001000
	out	(3),a
	ld	a,(hl)
	out	(3),a

	ld	d,a
	ld	e,a
	ld	c,__IntDataEnd-__IntData
	ld	hl,$0000
	ldir

	in	a,(6)
	ld	($8A8A+__IntDataSMC-__IntData+1),a
__IntEnd:
	.db rp_Ans,9

Code: (New code: 38 bytes, more cycles but who cares?) [Select]

p_IntSetup:
	.db __IntEnd-p_IntSetup-1
	di
	ld	a,l
	ld	hl,$8C06
	ld	de,$8C05
	ld	bc,$8C05-$8A8A

	and	l
	out	(4),a
	ld	a,h
	out	(3),a
	dec	a
	ld	i,a
	dec	a
	out	(3),a

	ld	(hl),a
	lddr

	ld	hl,$0000
	ld	c,__IntDataEnd-__IntData
	ldir

	in	a,(6)
	ld	($8A8A+__IntDataSMC-__IntData+1),a
__IntEnd:
	.db rp_Ans,11

p_DtoF: 2 bytes smaller. Takes advantage of a bcall to do the same thing. It appears that B_CALL(_SetXXXXOP2) always returns OP2+1, which could be used to save an additional 2 bytes, but this bcall could theoretically be changed in future OS versions and break this optimization.

Code: (Old code: 13 bytes, a lot of cycles) [Select]

p_DtoF:
	.db 13
	ex	(sp),hl
	B_CALL(_SetXXXXOP2)
	ld	hl,OP2
	pop	de
	ld	bc,9
	ldir

Code: (New code: 11 bytes, a lot plus a few cycles) [Select]

p_DtoF:
	.db 11
	ex	(sp),hl
	B_CALL(_SetXXXXOP2)
	ld	hl,OP2
	pop	de
	B_CALL(_Mov9B)

calc84maniac · « **Reply #242 on:** September 20, 2011, 12:26:00 am »

p_Length: 1 byte smaller, 2 cycles faster. Takes advantage of the fact that you will not need to search more than 16384 bytes starting at $4000-$7FFF or 32768 bytes starting at $8000-$FFFF, and also you shouldn't be searching at $0000-$3FFF.

Code: ((Old code: 11 bytes)) [Select]

p_Length:
	.db __LengthEnd-$-1
	xor	a
	ld	b,a
	ld	c,a
	cpir
	ld	hl,-1
	sbc	hl,bc
	ret
__LengthEnd:

Code: ((New code: 10 bytes)) [Select]

p_Length:
	.db __LengthEnd-$-1
	xor	a
	ld	b,h
	ld	d,h
	ld	e,l
	cpir
	scf
	sbc	hl,de
	ret
__LengthEnd:

jacobly · « **Reply #243 on:** October 09, 2011, 10:16:40 am »

Speed optimization for p_CheckSum by using an absolute jump.

Code: (Old Code: 19 bytes, 63.5*n+37 cycles) [Select]

p_CheckSum:
	.db __CheckSumEnd-$-1
	ld	b,h
	ld	c,l
	pop	af
	pop	hl
	push	af
	xor	a
	ld	d,a
__CheckSumLoop:
	add	a,(hl)
	ld	e,a
	jr	nc,$+3
	inc	d
	cpi
	ex	de,hl
	ret	po
	ex	de,hl
	jr	__CheckSumLoop
__CheckSumEnd:

Code: (New Code: 19 bytes, 44.5*n+65 cycles) [Select]

p_CheckSum:
	.db __CheckSumEnd-$-1
	ld	b,h
	ld	c,l
	pop	af
	pop	hl
	push	af
	xor	a
	ld	d,a
__CheckSumLoop:
	add	a,(hl)
	jr	nc,$+3
	inc	d
	cpi
	jp	pe,__CheckSumLoop
	ld	h,d
	ld	l,a
	ret
__CheckSumEnd:

Xeda112358 · « **Reply #244 on:** October 09, 2011, 05:38:20 pm »

Hmm, would this optimisation work to save one more byte? (sorry, I could be wrong):

Code: [Select]

p_CheckSum:
	.db __CheckSumEnd-$-1
	ld	b,h
	ld	c,l
	pop	hl
	ex      (sp),hl
	xor	a
	ld	d,a
__CheckSumLoop:
	add	a,(hl)
	jr	nc,$+3
	inc	d
	cpi
	jp	pe,__CheckSumLoop
	ld	h,d
	ld	l,a
	ret
__CheckSumEnd:

calc84maniac · « **Reply #245 on:** October 09, 2011, 07:21:47 pm »

Ah, nice use of ex (sp),hl

Xeda112358 · « **Reply #246 on:** October 09, 2011, 07:26:47 pm »

Thanks

I think I learned it from you folks

EDIT: It does use 2 more cycles though, right?

calc84maniac · « **Reply #247 on:** October 09, 2011, 07:30:34 pm »

Quote from: Xeda112358 on October 09, 2011, 07:26:47 pm

Thanks I think I learned it from you folks
EDIT: It does use 2 more cycles though, right?

Actually, ex (sp),hl takes 2 fewer cycles than pop af and push af combined, so it's faster too

Happybobjr · « **Reply #248 on:** October 09, 2011, 07:37:42 pm »

what is checksum do?

calc84maniac · « **Reply #249 on:** October 13, 2011, 11:32:57 am »

Here, slightly optimized Bitmap():
Old code, 7 bytes and lots of cycles

Code: [Select]

p_EzSprite:
	.db 7
	pop	de
	ld	a,e
	pop	de
	ld	d,a
	B_CALL(_DisplayImage)

New code, 6 bytes and lots of cycles minus 4

Code: [Select]

p_EzSprite:
	.db 6
	pop	bc
	pop	de
	ld	d,c
	B_CALL(_DisplayImage)

Xeda112358 · « **Reply #250 on:** October 14, 2011, 02:54:36 pm »

Is this an optimisation? I get the feeling that there is a reason it doesn't end in an ret and that it uses a jr...

Code: (Old Code: 7 bytes, 30 or 38 cycles) [Select]

p_DecWord:
	.db 7
	ld	a,(hl)
	dec	(hl)
	or	a
	jr	nz,$+4
	inc	hl
	dec	(hl)

Code: (New Code: 6 bytes, 29 or 36) [Select]

p_DecWord:
	.db 6
	ld	a,(hl)
	dec	(hl)
	or	a
	ret	nz
	inc	hl
	dec	(hl)

EDIT Yep, suspicion confirmed

Quigibo · « **Reply #251 on:** November 04, 2011, 01:58:14 am »

Not an optimization, but I'm posting this here since more assembly people will read it. Since the Bitmap() command is being replaced with something actually useful, that means the "Fix 8" and "Fix 9" will also need to be replaced. Are there any useful flags (particularly for text) that would be useful to Axe programmers that I haven't already covered with the other fix commands? A couple I can think of are an APD toggle or Lowercase toggle.

LincolnB · « **Reply #252 on:** November 04, 2011, 10:24:39 am »

Hm...I say this as an Axe programmer, not knowing ASM...how about UPSIDE DOWN TEXT! om nom nom nom

jacobly · « **Reply #253 on:** November 15, 2011, 12:01:37 am »

p_Input: saves three bytes and lots of cycles

Code: [Select]

p_Input:
	.db __InputEnd-$-1
	res	6,(iy+$1C)
	set	7,(iy+$09)
	xor	a
	ld	(ioPrompt),a
	B_CALL(_GetStringInput)
	B_CALL(_ZeroOP1)
	ld	hl,$2D04
	ld	(OP1),hl
	B_CALL(_ChkFindSym)
	inc	de
	inc	de
	ex	de,hl
	ret
__InputEnd:

Code: [Select]

p_Input:
	.db __InputEnd-$-1
	res	6,(iy+$1C)
	set	7,(iy+$09)
	xor	a
	ld	(ioPrompt),a
	B_CALL(_GetStringInput)
	B_CALL(_ZeroOP1)
	ld	a,$2D
	ld	(OP1+1),a
	rst	rFindSym
	inc	de
	inc	de
	ex	de,hl
	ret
__InputEnd:

Quigibo · « **Reply #254 on:** November 16, 2011, 05:52:32 pm »

Thanks!

Author Topic: Assembly Programmers - Help Axe Optimize! (Read 169111 times)

Happybobjr

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

jacobly

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Happybobjr

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Xeda112358

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

LincolnB

Re: Assembly Programmers - Help Axe Optimize!

jacobly

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!