Author Topic: Assembly Programmers - Help Axe Optimize! (Read 168351 times)

calc84maniac · « **Reply #30 on:** March 16, 2010, 10:13:57 pm »

Quote from: Quigibo on March 16, 2010, 09:58:29 pm

That's a cleaver trick! But wouldn't something like this be simpler?

or a sbc hl,de add hl,de jr nc,$+3 ex de,hl

But I'm trying to convert all of my math commands to signed operations anyway, so I would need to tweak it a bit.

Ah, I didn't think of that.

Here's a good way to do signed comparison by the way:
or a sbc hl,de ld a,h rla jp po,$+4 ccf
It should give the same flag outputs you would expect from an unsigned compare (note that rla does not modify the Z or P/V flags)

Edit:
Now I came up with one that can restore the original value of HL without destroying the C flag in the process:
or a sbc hl,de ld a,h jp po,$+4 cpl add hl,de rla

ztrumpet · « **Reply #31 on:** March 17, 2010, 04:00:24 pm »

That's really neat! Great job calc84!

Quigibo · « **Reply #32 on:** March 20, 2010, 10:00:23 pm »

Does anyone know any good sin/cos routines that are under 128 bytes? The entire circle should be 256 brads (binary radians) so each quadrant is 64. It doesn't need to be 100% accurate, but it should be pretty close. It doesn't need to be that fast either, but I would prefer using a method that doesn't need multiplication such as a look up table or CORDIC.

Builderboy · « **Reply #33 on:** March 21, 2010, 11:40:46 am »

Um well first question, (even though i'm not going to end up writing this routine

) what should the output be in since we don't have floating point?

Quigibo · « **Reply #34 on:** March 21, 2010, 04:42:47 pm »

It should be 128*sin() so that the number fits in a single signed byte.

Galandros · « **Reply #35 on:** March 23, 2010, 07:04:20 pm »

Quote from: Quigibo on March 20, 2010, 10:00:23 pm

Does anyone know any good sin/cos routines that are under 128 bytes? The entire circle should be 256 brads (binary radians) so each quadrant is 64. It doesn't need to be 100% accurate, but it should be pretty close. It doesn't need to be that fast either, but I would prefer using a method that doesn't need multiplication such as a look up table or CORDIC.

I think yes. It was made by Will West but I couldn't find the original post in Revsoft so I add as attach.

see: "Sin_A parabolic approximation of sin(a) a is in units of pi/256"

Quigibo · « **Reply #36 on:** March 23, 2010, 09:53:02 pm »

Hey thanks! That will definitely come in handy! It does use multiplication in that routine though, but oh well. It just makes it difficult on the parsing side to have to call other subroutines that may or may not already have been added, but I guess I'll figure out a way to template it better to make this easier in the future anyway.

calc84maniac · « **Reply #37 on:** June 02, 2010, 10:30:41 am »

Your 4-level grayscale routine is pretty unoptimized, and it doesn't even use the right dither pattern (1/3 and 2/3). I figured I could help out here. This is about as fast as it gets (with your double-buffer layout). The small size cost is worth it in this case, I think.

Code: [Select]

DispGraphRR:
	di
	ld a,$80
	out ($10),a
	ld (save_sp),sp
	ld l,plotSScreen&$ff - 1
	ld de,appbackupscreen - plotSScreen
	ld sp,plotSScreen - appbackupscreen + 12
	ld c,$1f
	dec (iy+asmflags2)
	jr nz,gray4skip
	ld (iy+asmflags2),3
	jr gray4entry3
gray4skip:
	ld a,(flags+asmflags2)
	dec a
	jr z,gray4entry2

gray4entry1:
	ld h,plotSScreen >> 8
	inc l
	ld b,64
	inc c
	ld a,c
	cp $2c
	jr z,gray4end
	out ($10),a
	
gray4loop1:
	ld a,(hl)
	add hl,de
	xor (hl)
	and %11011011
	xor (hl)
	add hl,sp
	out ($11),a
	djnz gray4loop2
	
gray4entry2:
	ld h,plotSScreen >> 8
	inc l
	ld b,64
	inc c
	ld a,c
	cp $2c
	jr z,gray4end
	out ($10),a
	
gray4loop2:
	ld a,(hl)
	add hl,de
	xor (hl)
	and %01101101
	xor (hl)
	add hl,sp
	out ($11),a
	djnz gray4loop3

gray4entry3:
	ld h,plotSScreen >> 8
	inc l
	ld b,64
	inc c
	ld a,c
	cp $2c
	jr z,gray4end
	out ($10),a
	
gray4loop3:
	ld a,(hl)
	add hl,de
	xor (hl)
	and %10110110
	xor (hl)
	add hl,sp
	out ($11),a
	djnz gray4loop1
	jr gray4entry1

gray4end:
	ld sp,(save_sp)
	ei
	ret

Edit: Some misnamed/missing labels

DJ Omnimaga · « **Reply #38 on:** June 02, 2010, 12:57:12 pm »

Will this one works in 15 MHz mode too, or just 6 MHz like his own routine?

(btw by 15 MHz, I really mean on real hardware, not just emulator)

Quigibo · « **Reply #39 on:** June 02, 2010, 05:58:35 pm »

Thanks! Although that looks much larger than my current routine, I'll have to see if the improvement in speed/quality is significant enough to justify the size increase. I'll do some more testing this week.

calc84maniac · « **Reply #40 on:** June 02, 2010, 08:48:00 pm »

DJ_Omni, it probably won't work in 15MHz mode. On the other hand, Quigibo's routine almost could run fine in 15MHz, which is a bad thing (means it would be pretty slow in 6MHz)

Quigibo · « **Reply #41 on:** June 02, 2010, 09:55:09 pm »

Yeah, I think it will be worth the size increase. Its a 46 byte increase, but it has the added bonus that it doesn't need to be initialized with a separate command (after I modify this routine). Also, I think there are a few places I can optimize to save on memory, but still not hinder the speed.

EDIT: Actually that code is pretty rock solid optimized, there were only a couple places I made improvements.

In the entry points, its better if the instructions are in this order:

Code: [Select]

	ld	a,c
	cp	$2c
	jr	z,__Disp4LvlDone
	ld	h,plotSScreen >> 8
	inc	l
	ld	b,64
	inc	c
	out	($10),a

Because this way, it jumps out of the loop sooner when it gets to the end of the routine. Not that big of a deal, but it saves several clock cycles each render.

Also, the jump table I changed to this:

Code: [Select]

	inc	(iy+asm_flag2)
	jr	z,__Disp4Lvlentry3
	ld	a,(flags+asm_flag2)
	inc	a
	jr	z,__Disp4Lvlentry2
	ld	(iy+asm_flag2),-2

So that you don't need to initialize the byte that keeps track of gray layer since it falls through if the number was uninitialized.

Thanks again! This does look a lot better

DJ Omnimaga · « **Reply #42 on:** June 02, 2010, 10:52:32 pm »

Nice to see possible optimizations

I don't mind an additional 46 bytes in my progs if the speed increases a lot personally. It's only if you use grayscale, anyway.

Quigibo · « **Reply #43 on:** June 02, 2010, 11:32:24 pm »

Actually, I just tried this on hardware, and its too fast for even 6MHz. But not to worry, I can group some of it into a subroutine to both add the needed delay and reduce the size of the code at the same time.

DJ Omnimaga · « **Reply #44 on:** June 02, 2010, 11:34:12 pm »

If it's too fast on 6 MHz too, will it glitch on it too?

Author Topic: Assembly Programmers - Help Axe Optimize! (Read 168351 times)

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

ztrumpet

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

Builderboy

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

Galandros

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

DJ Omnimaga

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

DJ Omnimaga

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

DJ Omnimaga

Re: Assembly Programmers - Help Axe Optimize!