Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Runer112

Pages: 1 ... 82 83 [84] 85 86 ... 153

1246

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: May 24, 2011, 10:36:16 pm »

Hmm I see what you mean... perhaps the mask rotation/logic is wrong in my routine? I wanted to send the program you posted to wabbitemu so I could debug the mask and logic computations at each step, but wabbitemu refuses to accept your program... And I don't see anything obviously wrong with my mask or logic.

1247

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: May 23, 2011, 11:22:29 am »

Is 71 cycles between outputs still too fast for your calculator perhaps? I don't notice any strange black and white lines on my calculator, which has a good LCD driver. The only lines I saw were the diagonal light and dark stripes that are inherent in any unsynced grayscale routine. Can you perhaps elaborate on the problem?

1248

The Axe Parser Project / Re: Axe Parser

« on: May 22, 2011, 11:29:08 pm »

I personally vote for another pass. That would solve the problem, wouldn't it? For me, even massive programs still compile in only a few seconds. I think giving up an additional second or two would be acceptable to allow for more free program structure.

1249

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: May 22, 2011, 07:13:24 pm »

Row-major 3-level grayscale? I already made one of those, it was just slower and a bit larger than the current routine so I didn't think you'd want it. It's 4 bytes larger and about 8000 cycles slower than the column-major routine I posted above, but here it is:

Code: (70 bytes, ~66541 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_DispGS:
	.db __DispGSEnd-1-$
	ld	hl,plotSScreen
	ld	de,appBackUpScreen
	call	$0000
	push	af
	ld	a,$07
	out	($10),a			;many cc into
	ld	a,(flags+asm_Flag2)
	rra
	sbc	a,a
	xor	%01010101
	ld	(flags+asm_Flag2),a
	ld	c,a
	ld	a,$80
__DispGSNext:
	push	af
	out	($10),a			;74cc into, 71cc loop
	ex	(sp),hl			;waste
	ex	(sp),hl			;waste
	rrc	c
	ld	b,12
	ld	a,$20
	out	($10),a			;71cc into
	push	af			;waste
	pop	af			;waste
__DispGSLoop:
	inc	bc			;waste
	dec	c			;waste
	ld	a,(de)
	and	c
	or	(hl)
	inc	de
	inc	hl
	out	($11),a			;72cc into, 71cc loop
	ld	a,(hl)			;waste
	djnz	__Disp4Lvlloop
	pop	af
	inc	a
	bit	6,a
	jr	z,__Disp4Lvlentry
__DispGSDone:
	pop	af
	out	($20),a
	ld	a,$05
	out	($10),a			;83cc into
	ret	c
	ei
	ret
__DispGSEnd:
	.db rp_Ans,__DispGSEnd-p_DispGS-8

1250

General Calculator Help / Re: Which color palette is better?

« on: May 22, 2011, 07:03:56 pm »

No, it was not. I eventually plan to pick it up again, but not right now.

1251

Official Contest / Re: [poll] Who do you think has the best chance of winning the 2011 Axe contest?

« on: May 22, 2011, 02:19:53 pm »

I don't think this topic is a good idea. People will see the results, and if their project was not voted for much, they will be discouraged and possibly give up.

1252

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: May 21, 2011, 07:15:18 pm »

I'm back, and this time with screen update routine optimizations! I've used 71 cycles as the target minimum delay between port outputs, because that's the number that you said worked for your calculator with a bad LCD driver. If you want these routines to target 72 or 73 cycles between port outputs instead, that's an easy modification for the first two routines. The grayscale routines could be harder.

EDIT: If you're going to use any of these, make sure to actually test them first.

EDIT 2: I previously didn't have an optimization for p_DispGS, but after more closely inspecting the routine, now I do!

p_FastCopy: 1 byte and 1548 cycles saved.

Code: (Original code: 46 bytes, ~59389 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_FastCopy:
	.db __FastCopyEnd-1-$
FastCopy:
	ld	hl,plotSScreen
	ld	a,$80
	out	($10),a
	ld	c,-$0C
	call	$0000			;Safety
	push	af
__FastCopyAgain:
	ld	b,64			;7
	ld	a,c			;4
	add	a,$2C			;7
	out	($10),a			;11
	ld	a,(hl)			;7 (waste)
	inc	de			;6 (waste)
__FastCopyLoop:
	push	af			;11 (waste)
	pop	af			;10 (waste)
	ld	de,12			;10
	ld	a,(hl)			;7
	add	hl,de			;11
	out	($11),a			;11
	djnz	__FastCopyLoop		;13/8
	ld	de,1-(12*64)		;10
	add	hl,de			;11
	inc	c			;4
	jr	nz,__FastCopyAgain	;12
__FastCopyRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__FastCopyEnd:
	.db rp_Ans,__FastCopyEnd-__FastCopyAgain+3

Code: (Optimized code: 45 bytes, ~57841 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_FastCopy:
	.db __FastCopyEnd-1-$
	ld	hl,plotSScreen
	ld	c,-$0C
	ld	a,$80
	out	($10),a			;??cc into
	call	$0000
	push	af
__FastCopyAgain:
	push	hl
	ld	a,c
	add	a,$2C
	out	($10),a			;many cc into, 73cc loop
	inc	de			;waste
	ld	b,64
__FastCopyLoop:
	ld	a,(hl)			;waste
	inc	de			;waste
	dec	de			;waste
	ld	de,12
	ld	a,(hl)
	add	hl,de
	out	($11),a			;71cc into, 71cc loop
	djnz	__FastCopyLoop
	pop	hl
	inc	hl
	inc	c
	jr	nz,__FastCopyAgain
__FastCopyRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__FastCopyEnd:
	.db rp_Ans,__FastCopyEnd-p_FastCopy+11

p_DrawAndClr: 2 bytes and 1548 cycles saved. Pretty much the same optimization as above.

Code: (Original code: 47 bytes, ~59389 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_DrawAndClr:
	.db __DrawAndClrEnd-1-$
	ld	hl,plotSScreen
	ld	a,$80
	out	($10),a
	ld	c,-$0C
	call	$0000			;Safety
	push	af
__DrawAndClrAgain:
	ld	b,64			;7
	ld	a,c			;4
	add	a,$2C			;7
	out	($10),a			;11
	ld	a,(hl)			;7 (waste)
	inc	de			;6 (waste)
__DrawAndClrLoop:
	ld	de,12			;10
	ld	a,(hl)			;7
	ld	(hl),d			;7
	ld	(hl),d			;7 (waste)
	ld	(hl),d			;7 (waste)
	add	hl,de			;11
	out	($11),a			;11
	djnz	__DrawAndClrLoop	;13/8
	ld	de,1-(12*64)		;10
	add	hl,de			;11
	inc	c			;4
	jr	nz,__DrawAndClrAgain	;12
__DrawAndClrRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__DrawAndClrEnd:
	.db rp_Ans,__DrawAndClrEnd-__DrawAndClrAgain+3

Code: (Optimized code: 45 bytes, ~57841 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_DrawAndClr:
	.db __FastCopyEnd-1-$
	ld	hl,plotSScreen
	ld	c,-$0C
	ld	a,$80
	out	($10),a			;??cc into
	call	$0000
	push	af
__DrawAndClrAgain:
	push	hl
	ld	a,c
	add	a,$2C
	out	($10),a			;many cc into, 73cc loop
	inc	de			;waste
	ld	b,64
__DrawAndClrLoop:
	inc	de			;waste
	dec	de			;waste
	ld	de,12
	ld	a,(hl)
	ld	(hl),d
	add	hl,de
	out	($11),a			;71cc into, 71cc loop
	djnz	__DrawAndClrLoop
	pop	hl
	inc	hl
	inc	c
	jr	nz,__DrawAndClrAgain
__DrawAndClrRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__DrawAndClrEnd:
	.db rp_Ans,__DrawAndClrEnd-__DrawAndClrAgain+11

p_DispGS: ~4847 cycles faster! This is more of a bug fix than an optimization; the old routine copied 13 columns!

Code: (Original code: 66 bytes, ~63507 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_DispGS:
	.db __DispGSEnd-1-$
	call	$0000
	push	af
	ld	a,$80
	out	($10),a
	ld	(OP2),sp
	ld	hl,flags+asm_Flag2
	rr	(hl)
	sbc	a,a
	xor	%01010101
	ld	(hl),a
	ld	c,a
	ld	l,appbackupscreen&$ff-1
	ld	sp,plotSScreen-appbackupscreen
__DispGSNext:
	ld	a,l			;4
	ld	b,64			;7
	add	a,$21-(appbackupscreen&$ff);7
	out	($10),a			;11		Into loop: 59 T-states
	inc	l			;4
	ld	h,appbackupscreen>>8	;7
	ld	de,appbackupscreen-plotSScreen+12;11
__DispGSLoop:
	ld	a,(hl)			;7		Loop: 61 T-states
	rrc	c			;8
	and	c			;4
	add	hl,sp			;11
	or	(hl)			;7
	out	($11),a			;11
	add	hl,de			;11
	djnz	__DispGSLoop		;13/8		Next Loop: 60 T-states
	ld	a,l			;4
	cp	12+(appbackupscreen&$ff);7
	jr	nz,__DispGSNext		;12
__DispGSDone:
	ld	sp,(OP2)
__DispGSRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__DispGSEnd:
	.db rp_Ans,__DispGSEnd-p_DispGS-2

Code: (Optimized code: 66 bytes, ~58660 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_DispGS:
	.db __DispGSEnd-1-$
	call	$0000
	push	af
	ld	a,$80
	out	($10),a			;many cc into
	ld	(OP2),sp
	ld	hl,flags+asm_Flag2
	rr	(hl)
	sbc	a,a
	xor	%01010101
	ld	(hl),a
	ld	c,a
	ld	l,appbackupscreen&$ff-1
	ld	sp,plotSScreen-appbackupscreen
__DispGSNext:
	ld	a,l
	ld	b,64
	add	a,$20-(appbackupscreen&$ff-1)
	out	($10),a			;113cc into, 71cc loop
	inc	hl
	ld	h,appbackupscreen>>8
	ld	de,appbackupscreen-plotSScreen+12
__DispGSLoop:
	ld	a,(hl)
	rrc	c
	and	c
	add	hl,sp
	or	(hl)
	out	($11),a			;71cc into, 72cc loop
	add	hl,de
	djnz	__DispGSLoop
	ld	a,l
	cp	12+(appbackupscreen&$ff-1)
	jr	nz,__DispGSNext
__DispGSDone:
	ld	sp,(OP2)
__DispGSRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__DispGSEnd:
	.db rp_Ans,__DispGSEnd-p_DispGS-2

p_Disp4Lvl: 3 bytes larger, but ~7693 cycles faster! Extra bonuses: updates in row-major order for cleaner grayscale AND works with any pair of buffers!

Code: (Original code: 79 bytes, ~78433 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_Disp4Lvl:
	.db __Disp4LvlEnd-1-$
	call	$0000
	push	af
	ld	(OP2+2),sp
	ld	a,$80
	out	($10),a
	ld	sp,appbackupscreen - plotSScreen
	ld	e,(plotSScreen-appbackupscreen+12)&$ff
	ld	c,-$0C
	ex	af,af'
	ld	a,%11011011
	ld	hl,flags+asm_flag2
	inc	(hl)
	jr	z,__Disp4Lvlskip
	add	a,a
	ld	b,(hl)
	inc	b
	jr	z,__Disp4Lvlskip
	rlca
	ld	(hl),-2
__Disp4Lvlskip:
	ld	l,plotSScreen&$ff-1
	ex	af,af'
__Disp4Lvlentry:
	ld	a,c
	add	a,$2C
	ld	h,plotSScreen>>8
	inc	l
	ld	b,64
	out	($10),a
__Disp4Lvlloop:
	ld	a,(hl)
	add	hl,sp
	xor	(hl)
	ex	af,af'
	cp	e
	rra
	ld	d,a
	ex	af,af'
	and	d
	xor	(hl)
	out	($11),a
	ld	d,(plotSScreen-appbackupscreen+12)>>8
	add	hl,de
	djnz	__Disp4Lvlloop
	inc	c
	jr	nz,__Disp4Lvlentry
__Disp4LvlDone:
	ld	sp,(OP2+2)
__Disp4LvlRestore:
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__Disp4LvlEnd:
	.db rp_Ans,__Disp4LvlEnd-p_Disp4Lvl-2

Code: (Optimized code: 82 bytes, ~70740 cycles with 3-cycle LCD port delay, excluding p_Safety) [Select]

p_Disp4Lvl:
	.db __Disp4LvlEnd-1-$
	ld	hl,appBackUpScreen
	ld	de,plotSScreen
	call	$0000
	push	af
	push	hl
	ld	a,$07
	out	($10),a			;many cc into
	ld	a,%11011011
	or	a
	ld	hl,flags+asm_flag2
	inc	(hl)
	jr	z,__Disp4Lvlskip
	rra
	ld	b,(hl)
	inc	b
	jr	z,__Disp4Lvlskip
	rra
	ld	(hl),-2
__Disp4LvlSkip:
	ex	af,af'
	pop	hl
	ld	a,$80
__Disp4LvlEntry:
	out	($10),a			;76+cc into, 71cc loop
	push	af
	ex	(sp),hl			;waste
	ex	(sp),hl			;waste
	nop				;waste
	ld	a,$20
	out	($10),a			;71cc into
	ld	b,12
__Disp4LvlLoop:
	ex	af,af'
	rra
	ld	c,a
	ex	af,af'
	ld	a,(de)
	xor	(hl)
	and	c
	xor	(hl)
	inc	de
	inc	hl
	out	($11),a			;71cc into, 77cc loop
	djnz	__Disp4LvlLoop
	inc	bc			;waste
	ex	af,af'
	rra
	ex	af,af'
	pop	af
	inc	a
	bit	6,a
	jr	z,__Disp4LvlEntry
__Disp4LvlDone:
	ld	a,$05
	out	($10),a			;73cc into
	pop	af
	out	($20),a
	ret	c
	ei
	ret
__Disp4LvlEnd:
	.db rp_Ans,__Disp4LvlEnd-p_Disp4Lvl-8

Also, I'm going to bump a few old optimization suggestions. They may have been skipped because Axe couldn't support them at the time, but in case it can now or in the near future, I'll make sure they aren't forgotten. And I'll throw in a new optimization that would also require an upgraded command parser.

Quote from: Runer112 on January 09, 2011, 01:00:51 pm

And as a side note, would it be possible to reformat DS<() so that the variable is reinitialized to its maximum value at the End? That way, 3 bytes could be saved by having both the zero and not zero conditions using the same store command. For example:

Code: [Select]
ld hl,(var) dec hl ld a,h or l jp nz,DS_End ;Code inside statement goes here ld hl,max DS_End: ld (var),hl

Quote from: Runer112 on February 17, 2011, 09:47:36 pm

Now that you have absolute jumps implemented:

Code: (Original code) [Select]
p_Exchange: .db 13 pop de ex (sp),hl pop bc ld a,(de) ldi dec hl ld (hl),a inc hl ld a,b or c jr nz,$-8
Code: (Optimized code) [Select]
p_Exchange: .db 12 pop de ex (sp),hl pop bc __ExchangeLoop: ld a,(de) ldi dec hl ld (hl),a inc hl jp pe,__ExchangeLoop ;or is it po?

Code: (Original code: 27 bytes, ~220 cycles) [Select]

p_DKeyVar:
	.db __DKeyVarEnd-1-$
	dec	l
	ld	a,l
	rra
	rra
	rra
	and	%00000111
	inc	a
	ld	b,a
	ld	a,%01111111
	rlca
	djnz	$-1
	ld	h,a
	ld	a,l
	and	%00000111
	inc	a
	ld	b,a
	ld	a,%10000000
	rlca
	djnz	$-1
	ld	l,a
	ret
__DKeyVarEnd:

Code: (Optimized code: 23 bytes, ~259 cycles) [Select]

p_DKeyVar:
	.db __DKeyVarEnd-1-$
	ld	c,l
	dec	c
	ld	a,c
	rra
	rra
	rra
	call	__DKeyVarMask
	cpl
	ld	h,a
	ld	a,c
__DKeyVarMask:
	and	%00000111
	inc	a
	ld	b,a
	ld	a,%10000000
	rlca
	djnz	$-1
	ld	l,a
	ret
__DKeyVarEnd:

1253

Axe / Re: Axe Q&A

« on: May 21, 2011, 01:49:54 pm »

Actually, the slowest Axe command (that won't freeze the calculator altogether) is Pause 0, which would take about 220 million cycles. That's a bit over half a minute at 6MHz.

1254

The Axe Parser Project / Re: Bug Reports

« on: May 20, 2011, 07:13:26 pm »

To get around this bug in the meantime, just make sure your constant won't trigger any constant auto optimizations. The simplest way to ensure this is to always enclose the constant in parentheses when you use it. This will cause some unnecessary code bloat, but when the next version of Axe is released and hopefully addresses this issue, you can remove the parentheses.

1255

The Axe Parser Project / Re: Bug Reports

« on: May 20, 2011, 07:07:30 pm »

There appears to be quite a fatal flaw with user-defined constants. If a user-defined constant is used as the operand in an operation that wouldn't qualify for any constant auto optimizations, the code will be compiled correctly. But if the operation would be turned into a constant auto optimization (e.g. any constant less-than/greater-than comparison), the parser gets confused. It parses the code as the normal, unoptimized command, which would still work if that was the only flaw. However, because of what I can only guess is the remnants of some sort of debugging code, the constant is replaced with 0x9001.

1256

Axe / Re: Axe Q&A

« on: May 20, 2011, 11:55:31 am »

You can divide by 256 about 500,000 times a second, which is one of the reasons why inflating variables by 256 is the preferred method of increasing accuracy. My guess would be that drawing 2 massive rectangles every frame can't help.

EDIT: The second block of code you posted should run at virtually the exact same speed as the original block of code. Are you sure it actually sped up? And if so, did you make any other changes to the code?

EDIT 2: I think the program appearing to run slowly is just an illusion caused by your object moving at slow speeds. It gets about 40 frames per second, the object is just in the same position in many frames because it is moving slowly.

1257

The Axe Parser Project / Re: Bug Reports

« on: May 20, 2011, 11:31:46 am »

The flipH() command has a problem. I think when I was optimizing it, I got confused about which register pair held a pointer to what and ended up using an 8-bit increase for the sprite input instead of the sprite output. Please change inc l \ inc de to inc e \ inc hl.

1258

TI Z80 / Re: My explosive contest entry

« on: May 20, 2011, 11:27:53 am »

Thanks for mentioning the horizontal flipping issue. I've found the problem and will make a bug report post about it immediately.

1259

ASM / Re: [83+/84+] Minimum safe LCD delay?

« on: May 19, 2011, 03:13:17 am »

That's without ALCDFIX applied, right? Because unless I'm mistaken, the point of ALCDFIX is to add delay to out instructions such that the LCD driver will be able to keep up with an ~66 cycle loop on any calculator.

And are you sure those numbers are right for your calculator? Because when I inspected Axe's grayscale routines, I noticed that the 4-level grayscale routine dips down to 67 cycles between two out instructions at one point (between column setting and the first output to the column). Shouldn't that cause problems then?

1260

ASM / [83+/84+] Minimum safe LCD delay?

« on: May 19, 2011, 02:40:43 am »

What's the minimum number of cycles at 6MHz that should exist between LCD outputs to be safe on just about any calculator? I think I recall someone in IRC suggesting 66, which is what Ion's routine uses, while routines used in Axe are closer to 70. And how exactly might things like the LCD delay port, bad LCD drivers, and ALCDFIX affect the timing that should be used in such a routine?

Pages: 1 ... 82 83 [84] 85 86 ... 153