Author Topic: Assembly Programmers - Help Axe Optimize! (Read 169059 times)

Quigibo · « **Reply #165 on:** April 20, 2011, 07:58:47 pm »

That's actually impossible since its a form of the Halting Problem. It could be faked by some extent but it would be incredibly inefficient. Runer, I'll get those routines up soon.

Builderboy · « **Reply #166 on:** April 20, 2011, 08:16:23 pm »

I was thinking more along the lines of, given an existing routine (with a certain size and speed) it must create a smaller and faster routine, making there a finite number of cases to test. If a testing routine took longer than the speed of the example, it could immediately terminate and go to the next routine, guaranteeing that the program would eventually terminate.

Runer112 · « **Reply #167 on:** April 21, 2011, 12:09:28 am »

Here are the most optimized constant comparisons I could come up with. I'm not sure what optimized comparisons for powers of 2 you found, because I didn't really find any. The only individual special cases I found dealt with 0, 32768, and 65535. And I hope the parser can handle the fancy constant mangling operations necessary for some of these.

Code: [Select]

p_GE0:
	.db	3
	ld	hl,1

p_GT65535:
	.db	3
	ld	hl,0

p_LE65535:
	.db	3
	ld	hl,1

p_LT0:
	.db	3
	ld	hl,0

p_GE1 =p_NE0
p_GT0 =p_NE0
p_LE0 =p_EQ0
p_LT1 =p_EQ0

p_GE32768 =p_Div32768
p_GT32767 =p_Div32768
p_LE32767 =p_SGE0
p_LT32768 =p_SGE0

p_GE65535 =p_EQN1
p_GT65534 =p_EQN1
p_LE65534 =p_NEN1
p_LT65535 =p_NEN1

p_GEconstMod256EQ0:
	.db	6
	ld	a,h
	sub	const>>8
	sbc	hl,hl
	inc	hl

p_GTconstMod256EQ255:
	.db	6
	ld	a,h
	sub	const+1>>8
	sbc	hl,hl
	inc	hl

p_LEconstMod256EQ255:
	.db	6
	ld	a,h
	add	a,-(const+1>>8)
	sbc	hl,hl
	inc	hl

p_LTconstMod256EQ0:
	.db	6
	ld	a,h
	add	a,-(const>>8)
	sbc	hl,hl
	inc	hl

p_GEconst:
	.db	8
	xor	a
	ld	de,-const
	add	hl,de
	ld	h,a
	rla
	ld	l,a

p_GTconst:
	.db	8
	xor	a
	ld	de,-(const+1)
	add	hl,de
	ld	h,a
	rla
	ld	l,a

p_LEconst:
	.db	7
	ld	de,-const
	add	hl,de
	sbc	hl,hl
	inc	hl

p_LTconst:
	.db	7
	ld	de,-(const+1)
	add	hl,de
	sbc	hl,hl
	inc	hl

Runer112 · « **Reply #168 on:** April 25, 2011, 02:50:35 pm »

~~Stolen~~ borrowed from WikiTI:

Code: [Select]


#define FULLSPEED  in a,2 \ rla \ sbc a,a \ out (20h),a

And this gives you the added bonus of the CPU operating approximately 25KHz faster at full speed mode!

calc84maniac · « **Reply #169 on:** April 25, 2011, 03:04:55 pm »

Quote from: Runer112 on April 25, 2011, 02:50:35 pm

~~Stolen~~ borrowed from WikiTI:

Code: [Select]
#define FULLSPEED in a,2 \ rla \ sbc a,a \ out (20h),a
And this gives you the added bonus of the CPU operating approximately 25KHz faster at full speed mode!

Note: This has the side effect of out (0),0 on the TI-83+. Is that okay?

Munchor · « **Reply #170 on:** April 25, 2011, 03:07:22 pm »

Quote from: calc84maniac on April 25, 2011, 03:04:55 pm

Quote from: Runer112 on April 25, 2011, 02:50:35 pm
~~Stolen~~ borrowed from WikiTI:

Code: [Select]
#define FULLSPEED in a,2 \ rla \ sbc a,a \ out (20h),a
And this gives you the added bonus of the CPU operating approximately 25KHz faster at full speed mode!
Note: This has the side effect of out (0),0 on the TI-83+. Is that okay?

Axe doesn't work on the 83+ right?

TIfanx1999 · « **Reply #171 on:** April 25, 2011, 03:34:18 pm »

As far as I know Axe should work fine on the TI-83+.

Runer112 · « **Reply #172 on:** April 25, 2011, 09:18:01 pm »

Quote from: calc84maniac on April 25, 2011, 03:04:55 pm

Quote from: Runer112 on April 25, 2011, 02:50:35 pm
~~Stolen~~ borrowed from WikiTI:

Code: [Select]
#define FULLSPEED in a,2 \ rla \ sbc a,a \ out (20h),a
And this gives you the added bonus of the CPU operating approximately 25KHz faster at full speed mode!
Note: This has the side effect of out (0),0 on the TI-83+. Is that okay?

Since this is the current routine:

Code: [Select]


#define FULLSPEED  in a,(2) \ and 80h \ rlca \ out (20h),a

It shouldn't make 83+ compatibility any worse than it already is.

Deep Toaster · « **Reply #173 on:** April 26, 2011, 06:48:41 pm »

And

Quote from: WikiTI

The only side effect of this is that on the TI-83+ Basic this will cause both linkport lines to go high - which shouldn't matter too much if you're not using the linkport at that time, especially since both lines are high normally...

so there shouldn't be a problem. Except if you really tried, I guess.

calc84maniac · « **Reply #174 on:** April 26, 2011, 06:50:11 pm »

Well, the only problem is that it would completely mess with any program that happens to be using X->Port stuff.

Deep Toaster · « **Reply #175 on:** April 26, 2011, 06:52:25 pm »

True. Most people put Full in the beginning of the program anyway, but I guess it could cause problems.

As Runer112 said, no more than we already have.

Runer112 · « **Reply #176 on:** May 01, 2011, 01:09:25 am »

Was randomly browsing through the B_CALLs on WikiTI and found one that should save some bytes in the p_GetArc routine!

Code: (Original code: 56 bytes) [Select]

p_GetArc:
	.db __GetArcEnd-1-$
	push	de
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	jr	c,__GetArcFail
	push	de
	ex	de,hl
	ld	hl,(progPtr)
	sbc	hl,de
	pop	de
	ld	hl,9
	jr	c,__GetArcName
__GetArcStatic:
	ld	l,12
	and	%00011111
	jr	z,__GetArcDone
	cp	l
	jr	z,__GetArcDone
	ld	l,14
	jr	__GetArcDone
__GetArcName:
	add	hl,de
	B_CALL(_LoadDEIndPaged)
	ld	d,0
	inc	e
	inc	e
__GetArcDone:
	add	hl,de
	ex	de,hl
	pop	hl
	ld	(hl),e
	inc	hl
	ld	(hl),d
	inc	hl
	ld	(hl),b
	ex	de,hl
	ret
__GetArcFail:
	ld	hl,0
	pop	de
	ret
__GetArcEnd:

Code: (Optimized code: 51 bytes) [Select]

p_GetArc:
	.db __GetArcEnd-1-$
	push	de
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	jr	c,__GetArcFail
	B_CALL(_IsFixedName)		;$4363
	ld	hl,9
	jr	z,__GetArcName
__GetArcStatic:
	ld	l,12
	and	%00011111
	jr	z,__GetArcDone
	cp	l
	jr	z,__GetArcDone
	ld	l,14
	jr	__GetArcDone
__GetArcName:
	add	hl,de
	B_CALL(_LoadDEIndPaged)
	ld	d,0
	inc	e
	inc	e
__GetArcDone:
	add	hl,de
	ex	de,hl
	pop	hl
	ld	(hl),e
	inc	hl
	ld	(hl),d
	inc	hl
	ld	(hl),b
	ex	de,hl
	ret
__GetArcFail:
	ld	hl,0
	pop	de
	ret
__GetArcEnd:

EDIT: And on the topic of the GetCalc() routines, have you decided yet what to do about real and complex number variables? Because right now p_GetArc supports them correctly but the other GetCalc() routines do not. Whether or not you want to support (correctly) adjusting the pointer for real and complex number variables, it would be a good idea to standardize the routines.

Builderboy · « **Reply #177 on:** May 17, 2011, 07:30:54 pm »

Quote from: Runer112 on October 20, 2010, 08:56:30 am

Oops, necropost, oh well

I don't know if this approach was purposely left out, as it's 15 bytes larger than the current routine and sometimes slower. I'm referring to the square root routine. Whereas the current routine (14 bytes) takes 37n+38 T-states (linear time), where n is the result+1 (1-256), the following routine (29 bytes) takes 5n+800 T-states (near constant time), where n is the number of set bits in the result (0-8). The existing routine is faster for values that would yield results of 0-19, but this routine would be faster for values that would yield results of 20-255, which is a much broader range of the 8-bit spectrum. Also, it would be much more reliable to run at a near constant speed in programs which rely on that to run smoothly themselves. The existing routine would take only a few hundred T-states for low inputs, but would take up to OVER NINE THOUSAND T-states to calculate the square roots for the highest inputs. So it's up to you if this is something you want to use.

Code: [Select]
p_Sqrt: .db __SqrtEnd-1-$ ld a,l ld l,h ld de,$0040 ld h,d ld b,8 or a __SqrtLoop: sbc hl,de jr nc,__SqrtSkip add hl,de __SqrtSkip: ccf rl d rla adc hl,hl rla adc hl,hl djnz __SqrtLoop ld h,0 ld l,d ret __SqrtEnd:

Methinks this really should be added. Its *much* faster most of the time, and runs in more constant time, which is something that would be great for a routine with a reliable speed.

calc84maniac · « **Reply #178 on:** May 17, 2011, 09:11:15 pm »

And the size difference shouldn't be a big deal, because anyone who wants to use square roots in a big project would probably want a fast routine anyway

Quigibo · « **Reply #179 on:** May 18, 2011, 03:03:13 am »

Yeah, I guess I'll add it then. Found an optimization for it too; b is zero at the end of the djnz so it can be used to zero the h register which saves a byte and 3 cycles.

Author Topic: Assembly Programmers - Help Axe Optimize! (Read 169059 times)

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

Builderboy

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Munchor

Re: Assembly Programmers - Help Axe Optimize!

TIfanx1999

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Deep Toaster

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Deep Toaster

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Builderboy

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!