Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Runer112

Pages: 1 ... 102 103 [104] 105 106 ... 153

1546

The Axe Parser Project / Re: Bug Reports

« on: January 06, 2011, 12:46:52 am »

It looks like any instance of getKey (the B_CALL) has a bit of a problem... the compiler doesn't seem to advance the parsing location upon reaching it, parsing it over and over again until the remainder of the free RAM is filled with copies of the getKey routine and you get a memory error. I would assume this has something to do with the addition of the variable-argument direct port input getKey.

EDIT: Huh, that's odd... I'm getting the same problem with Axe 0.4.6. Surely the problem hasn't existed for this long and gone without noticing?

EDIT 2: I think this problem must be somewhere on my side. But this error happens on both wabbitemu and my real calculator. I'm puzzled...

EDIT 3: Hmm I think I may have found the problem... it looks like the error only crops up when getKey is the very last token in the source. Can anybody else confirm this?

1547

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: January 05, 2011, 11:11:30 pm »

Quote from: Quigibo on January 05, 2011, 10:59:54 pm

However, these are the concerns I have: First, the sprite rotation commands, why did you move the ret to the middle of the routine? It looks like that's just going to add more cycles since a conditional jr takes the same amount of cycles as a regular jr anyway.

Yeah, I'm not really sure why I did that. Feel free to initialize c to 8 instead and decrease and check c at the end using a conditional jump instead.

Quote from: Quigibo on January 05, 2011, 10:59:54 pm

Next, is it really a safe assumption that all ROM pages are between $7F and $FF for all current models and potentially future models?

$01 and $7F are all ROM pages, and $80-$87 are all RAM pages (at least for the calculators that have all those RAM pages), so it would make sense that $80 and up is RAM. But feel free to leave this optimization out anyways, it only saves 4 cycles part of the time.

Quote from: Quigibo on January 05, 2011, 10:59:54 pm

And lastly, are you sure trying to modifying rom (unsuccessfully) has no potential side effects to things like flags and registers?

After a quick test, yes, rrd and rld affect a correctly even when hl points to a byte in ROM.

EDIT: By the way Quigibo, the reason I was looking at every source routine for Axe is because I'm documenting the size and (at least approximate) speed of every Axe command. If I finish it, would you want to bundle it with future Axe releases? If not I'd probably post it somewhere on the forums anyway, so people could still see it.

1548

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: January 05, 2011, 07:16:59 pm »

I just sort of blindly copied the $8000-$FFFF routine, not taking into account the fact that $0000-7FFF was ROM. Good catch, I'll edit my post now. And it turns out that's actually smaller anyways.

1549

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: January 05, 2011, 06:42:20 pm »

Wow that took a long time. But I hope the results will be worth it.
Quigibo, get out your reading glasses.

(By the way, I haven't tested these myself, but the code looks solid. If you believe that any of these would not work or have any questions, tell me.)

Smaller nibble retrieval routines. 1 byte saved for reading from RAM, 3 bytes saved for reading from ROM.

Thanks to calc84maniac for reminding me that $0000-$7FFF is read-only!

Code: (Original routine: 18 bytes, ~72 cycles) [Select]

p_Nib1:
	.db __Nib1End-$-1
	scf
	rr	h
	rr	l
	ld	a,(hl)
	jr	c,__Nib1Skip
	rrca
	rrca
	rrca
	rrca
__Nib1Skip:
	and	%00001111
	ld	l,a
	ld	h,0
	ret
__Nib1End:

Code: (Optimized routine: 17 bytes, ~105 cycles) [Select]

p_Nib1:
	.db __Nib1End-$-1
	xor	a
	scf
	rr	h
	rr	l
	ld	b,(hl)
__Nib1Loop:
	rrd
	ccf
	jr	c,__Nib1Loop
	ld	(hl),b
	ld	l,a
	ld	h,0
	ret
__Nib1End:

Code: (Original routine: 18 bytes, ~68 cycles) [Select]

p_Nib2:
	.db __Nib2End-$-1
	srl	h
	rr	l
	ld	a,(hl)
	jr	c,__Nib2Skip
	rrca
	rrca
	rrca
	rrca
__Nib2Skip:
	and	%00001111
	ld	l,a
	ld	h,0
	ret
__Nib2End:

Code: (Optimized routine: 15 bytes, ~77 cycles) [Select]

p_Nib2:
	.db __Nib2End-$-1
	xor	a
	srl	h
	rr	l
	rrd
	jr	c,__Nib2Skip
	rld
__Nib2Skip:
	ld	l,a
	ld	h,0
	ret
__Nib2End:

Smaller and faster nibble storage routine. 1 byte and ~17 cycles saved.

Code: (Original routine: 23 bytes, ~127 cycles) [Select]

p_NibSto:
	.db __NibStoEnd-$-1
	pop	bc
	pop	de
	push	bc
	scf
	rr	h
	rr	l
	ld	b,(hl)
	ex	de,hl	;hl = byte ;de = addr
	ld	a,%11110000
	jr	c,__NibStoSkip
	add	hl,hl
	add	hl,hl
	add	hl,hl
	add	hl,hl
	cpl
__NibStoSkip:
	and	b
	or	l
	ld	(de),a
	ret
__NibStoEnd:

Code: (Optimized routine: 22 bytes, ~110 cycles) [Select]

p_NibSto:			
	.db __NibStoEnd-$-1
	pop	bc
	pop	de
	push	bc
	scf
	rr	h
	rr	l
	jr	c,__NibStoHigh
	rrd
	ld	a,e
	rld
	ret
__NibStoHigh:
	rld
	ld	a,e
	rrd
	ret
__NibStoEnd:

Faster buffer inversion routine. 9951 cycles saved.

Code: (Original routine: 16 bytes, 38425 cycles) [Select]

p_InvBuff:
	.db __InvBuffEnd-1-$
	ld	hl,plotSScreen
	ld	bc,768
__InvBuffLoop:
	ld	a,(hl)
	cpl
	ld	(hl),a
	inc	hl
	dec	bc
	ld	a,b
	or	c
	jr	nz,__InvBuffLoop
	ret
__InvBuffEnd:

Code: (Optimized routine: 16 bytes, 28474 cycles) [Select]

p_InvBuff:			
	.db __InvBuffEnd-1-$
	ld	hl,plotSScreen
	ld	bc,3
__InvBuffLoop:
	ld	a,(hl)
	cpl
	ld	(hl),a
	inc	hl
	djnz	__InvBuffLoop
	dec	c
	jr	nz,__InvBuffLoop
	ret
__InvBuffEnd:

You'll laugh at this... but I managed to save 4 cycles in the unarchive and archive routines. And only if the targeted variable doesn't exist. But hey, why not take all the savings you can get.

I think this works. It relies on the page number returned in b always being 0 if a RAM page and always being in the range or $01-$7F if a flash page.

Code: (Original routine: 18 bytes, a lot of cycles) [Select]

p_Unarchive:
	.db __UnarchiveEnd-1-$
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	ret	c
	inc	b
	dec	b
	ret	z
	B_CALL(_Arc_Unarc)
	ld	hl,1
	ret
__UnarchiveEnd:

Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]

p_Unarchive:
	.db __UnarchiveEnd-1-$
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	ret	c
	dec	b
	ret	m
	inc	b
	B_CALL(_Arc_Unarc)
	ld	hl,1
	ret
__UnarchiveEnd:

Code: (Original routine: 18 bytes, a lot of cycles) [Select]

p_Archive:
	.db __ArchiveEnd-1-$
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	ret	c
	inc	b
	dec	b
	ret	nz
	B_CALL(_Arc_Unarc)
	ld	hl,1
	ret
__ArchiveEnd:

Code: (Optimized routine: 18 bytes, a lot of-4 cycles) [Select]

p_Archive:
	.db __ArchiveEnd-1-$
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	ret	c
	dec	b
	ret	p
	inc	b
	B_CALL(_Arc_Unarc)
	ld	hl,1
	ret
__ArchiveEnd:

Smaller archived variable locating. 4 bytes saved.

Code: (Original routine: 55 bytes, a lot of cycles) [Select]

p_GetArc:
	.db __GetArcEnd-1-$
	push	de
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	jr	c,__GetArcFail
	ld	a,(OP1)
	cp	ListObj
	jr	z,__GetArcName
	cp	ProgObj
	jr	z,__GetArcName
	cp	AppvarObj
	jr	z,__GetArcName
	cp	GroupObj
	jr	z,__GetArcName
__GetArcStatic:
	ld	hl,14
	jr	__GetArcDone
__GetArcName:
	ld	hl,9
	add	hl,de
	B_CALL(_LoadDEIndPaged)
	ld	d,0
	inc	hl
	inc	hl
__GetArcDone:
	add	hl,de
__GetArcFail:
	ex	de,hl
	pop	hl
	ld	(hl),e
	inc	hl
	ld	(hl),d
	inc	hl
	ld	(hl),b
	ex	de,hl
	ret
__GetArcEnd:

Code: (Optimized routine: 51 bytes, a lot of cycles) [Select]

p_GetArc:
	.db __GetArcEnd-1-$
	push	de
	MOV9TOOP1()
	B_CALL(_ChkFindSym)
	ld	hl,0
	jr	c,__GetArcFail
	and	%00011111
	ld	d,b
	ld	hl,__GetArcVarTypes
	ld	bc,__GetArcEnd-__GetArcVarTypes
	cpir
	ld	b,d
	ld	hl,14
	jr	nz,__GetArcDone
	ld	l,9
	add	hl,de
	B_CALL(_LoadDEIndPaged)
	ld	d,0
	inc	e
	inc	e
__GetArcDone:
	add	hl,de
__GetArcFail:
	ex	de,hl
	pop	hl
	ld	(hl),e
	inc	hl
	ld	(hl),d
	inc	hl
	ld	(hl),b
	ex	de,hl
	ret
__GetArcVarTypes:
	.db	ListObj,ProgObj,AppvarObj,GroupObj
__GetArcEnd:

Smaller 8-bit get bit routine. 1 byte saved.

Code: (Original routine: 13 bytes, ~110 cycles) [Select]

p_GetBit:
	.db 13
	ld	a,e
	and	%00000111
	inc	a
	ld	b,a
	ld	a,l
__GetBitLoop:
	add	a,a
	djnz	__GetBitLoop
	ld	h,b
	ld	l,b
	rl	l

Code: (Optimized routine: 12 bytes, ~152 cycles) [Select]

p_GetBit:
	.db 12
	ld	a,e
	and	%00000111
	inc	a
	ld	b,a
	xor	a
__GetBitLoop:
	ld	h,a
	add	hl,hl
	djnz	__GetBitLoop
	ld	l,h
	ld	h,a

As long as the low byte of vx_SptBuff is at most $F8: faster sprite flipping routines. 16 cycles saved each.

Code: (Original routine: 13 bytes, 338 cycles) [Select]

p_FlipV:
	.db __FlipVEnd-1-$
	ex	de,hl
	ld	hl,vx_SptBuff+8
	ld	b,8
__FlipVLoop:
	dec	hl
	ld	a,(de)
	ld	(hl),a
	inc	de
	djnz	__FlipVLoop
	ret
__FlipVEnd:

Code: (Optimized routine: 13 bytes, 322 cycles) [Select]

p_FlipV:
	.db __FlipVEnd-1-$
	ex	de,hl
	ld	hl,vx_SptBuff+8
	ld	b,8
__FlipVLoop:
	dec	l
	ld	a,(de)
	ld	(hl),a
	inc	de
	djnz	__FlipVLoop
	ret
__FlipVEnd:

Code: (Original routine: 21 bytes, 1907 cycles) [Select]

p_FlipH:
	.db __FlipHEnd-1-$
	ld	de,vx_SptBuff
	push	de
	ld	b,8
__FlipHLoop1:
	ld	c,(hl)
	ld	a,1
__FlipHLoop2:
	rr	c
	rla
	jr	nc,__FlipHLoop2
	ld	(de),a
	inc	hl
	inc	de
	djnz	__FlipHLoop1
	pop	hl
	ret
__FlipHEnd:

Code: (Optimized routine: 21 bytes, 1891 cycles) [Select]

p_FlipH:
	.db __FlipHEnd-1-$
	ld	de,vx_SptBuff
	push	de
	ld	b,8
__FlipHLoop1:
	ld	c,(hl)
	ld	a,1
__FlipHLoop2:
	rr	c
	rla
	jr	nc,__FlipHLoop2
	ld	(de),a
	inc	hl
	inc	e
	djnz	__FlipHLoop1
	pop	hl
	ret
__FlipHEnd:

Smaller and faster sprite rotating routines. 2 bytes smaller and 166 cycles faster. These also save 16 cycles from relying on the low byte of vx_SptBuff being at most $F8.

Code: (Original routine: 22 bytes, 2874 cycles) [Select]

p_RotC:
	.db __RotCEnd-1-$
	ex	de,hl
	ld	hl,vx_SptBuff
	ld	c,8
__RotCLoop1:
	push	hl
	ld	b,8
	ld	a,(de)
__RotCLoop2:
	rla
	rr	(hl)
	inc	hl
	djnz	__RotCLoop2
	pop	hl
	inc	de
	dec	c
	jr	nz,__RotCLoop1
	ret
__RotCEnd:

Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]

p_RotC:
	.db __RotCEnd-1-$
	ex	de,hl
	ld	c,8+1
__RotCLoop1:
	ld	hl,vx_SptBuff
	dec	c
	ret	z
	ld	b,8
	ld	a,(de)
__RotCLoop2:
	rla
	rr	(hl)
	inc	l
	djnz	__RotCLoop2
	inc	de
	jr	__RotCLoop1
__RotCEnd:

Code: (Original routine: 22 bytes, 2874 cycles) [Select]

p_RotCC:
	.db __RotCCEnd-1-$
	ex	de,hl
	ld	hl,vx_SptBuff
	ld	c,8
__RotCCLoop1:
	push	hl
	ld	b,8
	ld	a,(de)
__RotCCLoop2:
	rra
	rl	(hl)
	inc	hl
	djnz	__RotCCLoop2
	pop	hl
	inc	de
	dec	c
	jr	nz,__RotCCLoop1
	ret
__RotCCEnd:

Code: (Optimized routine: 20 bytes, 2708 cycles) [Select]

p_RotCC:
	.db __RotCCEnd-1-$
	ex	de,hl
	ld	c,8+1
__RotCCLoop1:
	ld	hl,vx_SptBuff
	dec	c
	ret	z
	ld	b,8
	ld	a,(de)
__RotCCLoop2:
	rra
	rl	(hl)
	inc	l
	djnz	__RotCCLoop2
	inc	de
	jr	__RotCCLoop1
__RotCCEnd:

That's all I have for now. I think I got just about everything I could possibly find, but I might have some more later. And if you want all the routines in one file, I uploaded them all here.

1550

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: January 04, 2011, 05:34:52 pm »

Alright, these aren't that urgent anyways. Just trying to squeeze every last byte and cycle out of Axe programs.

1551

Miscellaneous / Re: What is your avatar?

« on: January 04, 2011, 05:29:54 pm »

God.

1552

The Axe Parser Project / Re: Assembly Programmers - Help Axe Optimize!

« on: January 04, 2011, 12:11:05 pm »

It doesn't look like the optimized p_EQN2 and p_EQN1 routines that I suggested made it into Axe 0.4.7. Did you maybe forget to add them? Or were my routines incorrect?

Also, I think you might have overlooked my footnote in the first equality optimization post mentioning that p_Div32768 can be optimized to be the same as p_SLT0 and p_GetBit0.

EDIT: Also, it doesn't look like p_EQNX and p_NENX are being used by the parser. Is this just an accident, or did you intentionally leave them out for now due to the problem of the constant in the source code not equaling the constant that should be inserted?

And along the lines of comparing negative shorts, if you get p_NENX working, it should be the same size and faster for -3 than p_NEN3, so the latter should be removed. And for p_NEN2, can't the first instruction just be inc l instead of inc hl?

EDIT 2: I think I've mentioned this in the past as well, but it looks like you must've missed seeing it/adding it. p_GetBit15 could be optimized to be the same as p_Mod2.

EDIT 3: (Man this is going to be one long post) It was smart of you to make the greater than, greater or equal, less than, and less or equal comparisons with second arguments that are expressions call the opposite routine after popping the first argument into de, thus avoiding any double ex de,hl's. Could something like this also be used for the greater than and less or equal comparisons with the second argument being a variable? A byte could be saved by making these insances use ex de,hl / ld hl,($0000) to load the variable instead of ld de,($0000) and then call the opposite routine.

EDIT 4: Going along the lines of edit 3, p_SIntGt and p_SIntLe could be optimized for variable arguments in the same way. Although no bytes would be saved calling p_SIntLt instead of p_SIntGt, cycles would be.

1553

Axe / Re: The Optimization Compilation

« on: January 03, 2011, 11:07:14 pm »

Science lesson! YAY

(By the way I completely understand if you don't want to read this whole thing, it's a bit lengthy. I'll put the important parts in bold.)

All Axe operations rely on using the register pair hl as the "Ans" value, using it to hold the running value of calculations. For those who aren't fully familiar with z80 assembly, hl is a combination of h and l, two 8-bit (1-byte) "registers." Registers are like variables you might store in memory, but they're stored directly inside of the the processor, so they can be used quickly. The basic registers (a, b, c, d, e, h, and l) are all 8-bit values, so most commands were built to work with these 8-bit registers, hence the z80 being an 8-bit system. However, Zilog knew that 8 bits was a little restrictive, and especially for systems that would have more than 256 bytes of memory, being able to easily use and manipulate 16-bit values (like pointers) would be very helpful. Did you notice that the other 5 basic registers go in alphabetical order before randomly jumping to h and l? Well that's because h and l are two very special 8-bit registers, designed to easily be combined into the Higher and Lower halves of a 16-bit value. With this special designation come very useful 16-bit operations built-in.

*2, for instance, simply breaks down to one assembly instruction: add hl,hl. This simply adds the value of hl to itself, which in other words multiplies it by 2. Because Zilog knew a basic function like adding would be a core operation, they made sure to make it small and swift: 1 byte to call and 11 cycles to execute.

*256 is a multiplication by 2^8, so one could achieve this by adding hl to itself 8 times. But there's an easier way to think of this. Just like how multiplication by 10 in a decimal system shifts every digit left one place, multiplication by 2 in a binary system shifts every digit left one place as well. And because hl is a 16-bit value with the high 8 bits in h and the low 8 bits in l, shifting these bits left 8 places would just result in the value in l being shifted all the way out and into h, and 8 trailing zeros being shifted into l. So instead of using add hl,hl eight times, *256 uses the following instructions: ld h,l / ld l,0. The first costs 1 byte and 4 cycles and the second costs 2 bytes and 7 cycles, for a grand total of 3 bytes and 11 cycles. Just as fast as *2!

*128, however, isn't so easy. Again, the obvious approach is to add hl to itself 7 times. This would cost 7 bytes and 77 cycles. You may also think to use the previous technique to multiply hl by 256, and then divide it by 2. However, we have a problem (pun not intended). If we multiplied the value by 256 and then divided it by 2, we would have lost the highest bit from the multiplication by 256 before dividing it by 2 again! So that's a bit of a problem. Anyways, Axe is optimized for size, and we can do better: using a loop. And although the z80 is a pretty old system, they were nice enough to give us a built-in loop structure: ld b,7 / add hl,hl / djnz $-1. This loads 7 into the b register, adds hl to hl, and then repeats adding hl to hl 6 more (b-1) times. Although this is a good amount slower than either of the previous options, coming in at 170 cycles, it only takes 2 bytes to initialize the loop counter, 1 byte for the add instruction, and 2 bytes for the loop execution instruction, for a total of 5 bytes.

Sorry to bore you... But congratulations, you now all know at least a little bit about z80 assembly, the structure of compiled Axe code! And the more you know about Axe's internals, the more you can optimize it, whether it be for speed or size!

EDIT: Wow, it took me a whole hour to write this? Major ninja'd.

1554

Axe / Re: Moving a Sprite

« on: January 03, 2011, 01:11:46 pm »

+1 is smaller and faster than 1

1555

Axe / Re: Moving a Sprite

« on: January 03, 2011, 01:08:26 pm »

I opted out of that optimization because I assumed that if somebody wanted to use this code in their program, they would probably have more than just that one sprite in the buffer and wouldn't want to clear it every time they updated the screen.

1556

Axe / Re: Moving a Sprite

« on: January 01, 2011, 06:47:51 pm »

As small as I could get it, 345 bytes of executable code.

Code: [Select]

:.SMILE
:[004466000000817E]->Pic1
:DiagnosticOff
:0->X->Y
:Repeat getkey(15)
:ClrDraw
:getKey(3)-getKey(2)+X
:!If +1
:+1
:End
:-1
:Pt-On(min(,88)→X,getKey(1)-getKey(4)+Y+(=⁻1)min(,56)→Y,Pic1)
:DispGraph
:End

1557

The Axe Parser Project / Re: Features Wishlist

« on: December 31, 2010, 05:34:04 pm »

Quote from: ScoutDavid on December 31, 2010, 05:22:09 pm

Quote from: Runer112 on December 31, 2010, 05:20:38 pm
Speaking of optimizing control structures, any chance for Do...While loops?

I'd like Switch cases more than Do...While loops, but I don't really know how the second works.

Do...While loops are a lot like While...End loops, except that the check to advance to another iteration of the loop is at the end instead of the beginning. This turns out to be more optimized. Whereas a While...End loop has two jumps, one to exit the loop if the condition is false and one to jump back to the beginning, a Do...While combines the two into one jump that jumps back to the beginning if the condition is true.

I don't know how good that explanation was, but perhaps this pseudocode will help clarify it. Numbers in brackets indicate the number of bytes the real code takes:

While...End loop		Do...While loop
Code: [Select] `;While [x] Loop condition [2] Check if loop condition is true or false [3] Exit the loop if false ;Loop contents ;End [3] Jump back to the beginning of the loop`		Code: [Select] `;Do ;[0] Start of the loop, takes no actual code ;Loop contents ;While [x] Loop condition [2] Check if loop condition is true or false [3] Jump back to the beginning of the loop if true`

1558

The Axe Parser Project / Re: Features Wishlist

« on: December 31, 2010, 05:20:38 pm »

Speaking of optimizing control structures, any chance for Do...While loops?

1559

Axe / Re: The Optimization Compilation

« on: December 30, 2010, 12:01:33 am »

Both

It's 8 bytes smaller and should save about 52 t-states per iteration.

1560

Axe / Re: The Optimization Compilation

« on: December 29, 2010, 11:46:14 pm »

In case anyone was wondering, here is the smallest loop structure that I know of in Axe. It will execute the loop n times, with A starting at n-1 and decreasing down to 0:

Code: [Select]

n
While 
  -1→A
  ;Code
  A
End

And if Quigibo adds conditional jumps, this sort of Do...While loop would be even smaller (which he has not yet, so don't bother with this code as of Axe 0.4.7):

Code: [Select]

n
Lbl L
  →A
  ;Code
!If A-1
  Goto L
End

EDIT: As a side note in case you read this Quigibo, any chance for Do...While loops? They're more optimized than normal While loops.

Pages: 1 ... 102 103 [104] 105 106 ... 153