Author Topic: ASM Optimized routines (Read 115684 times)

Xeda112358 · « **Reply #45 on:** May 02, 2012, 09:07:45 pm »

Oh, wow, awesome! I cannot believe I didn't see that, that is a source of some of my other optimisations from the original code o.O

calc84maniac · « **Reply #46 on:** May 02, 2012, 10:25:44 pm »

Quote from: Runer112 on December 12, 2011, 03:46:00 pm

Here's a very optimized way to convert a 16-bit signed number into an 8-bit signed number in a with overflow handling (if hl<-128, a=-128; if hl>127, a=127). Two added bonus to being super small and super fast are that it destroys nothing and that you could easily modify it to make the input a 16-bit register other than hl.

Code: [Select]
Signed16To8: ld a,l add a,a sbc a,a sub h ld a,l ret z ld a,h add a,a sbc a,a xor %01111111 ret

Implied challenge accepted!

Code: [Select]

Signed16To8:
	ld a,l
	add a,a
	ld a,h
	adc a,l
	cp l
	ret z
	ld a,$7F
	ret p
	inc a
	ret

Edit: Whoops, misteak

Edit2: This routine is a failure, disregard its failtasticness

Xeda112358 · « **Reply #47 on:** July 03, 2012, 12:13:46 pm »

Necroedit: For a much better routine, please try the routines at the end of this post!

I created this last night for my next project:

Code: [Select]

PseudoRandWord:
;Outputs:
;     BC was the previous pseudorandom number
;     HL is the pseudorandom number
;f(n+1)=(241f(n)+257) mod 65536   ;65536
;181 cycles, add 17 if called
     ld hl,(randSeed)
     ld c,l
     ld b,h
     add hl,hl
     add hl,bc
     add hl,hl
     add hl,bc
     add hl,hl
     add hl,bc
     add hl,hl
     add hl,hl
     add hl,hl
     add hl,hl
     add hl,bc
     inc h
     inc hl
     ld (randSeed),hl
     ret

There are a few other nice features, too. For example, every 16-bit value is hit if you run this 65536 times. Or, if you only read 1 byte (for example, H from the output), it will hit every 8-bit number once if you run this 256 times. Plus, it can be seeded, which has its own uses. This can be modified to be smaller, too, if you know what you are doing, but I just like the numbers 241 and 257. Anyways, it produces some nice results

P.S.-I used this in a routine called "ShuffleDeck" and it works very well.

chickendude · « **Reply #48 on:** July 04, 2012, 11:19:47 am »

I don't understand the theory behind that algorithm, but you could save a couple clocks with SMC

And is ShuffleDeck a hint at what your next project might be?

Xeda112358 · « **Reply #49 on:** July 09, 2012, 08:08:46 am »

Yes, you can use SMC to save at least 6 cycles for RAM programs

My next mini project is an app with some small games (including card games). I don't have my computer with me, but I will post a working sound routine next time I get a chance.

The way it works is that we are using mod 2¹⁶, so I selected two numbers relatively prime to 65536 (so any odd number, in this case). There are a few other conditions dealing with the Euler phi function, I believe, but I got lucky with the numbers I chose, so I didn't need to look it up. If you check, I chose prime numbers, specifically, because I figured those would give me the best shot.

If you choose the wrong values, you will get cycles of 2ⁿ. I am not sure how familiar you are with group theory, but essentially, you will be creating sub groups and the order (size) of a subgroup will always divide the order of the main group. So some values will make cycles of 32768, 16384, and other smaller powers of 2. (gah, there is so much cool theory behind this, but I don't have much time).

EDIT: ooh, here is a useful routine

Code: [Select]

FindNumPages:
;Inputs:
;     The app base page is loaded in MemBank1
;Outputs:
;     c flag set if the field was found
;     nc means the app header subfield was not found
;     A is the number of app pages
;     B is 0
;     (HL) is the number of app pages

     ld hl,4000h
     ld bc,128
     ld a,c
     or a
FNPLoop:
     cpir
     ret po
     ret nz
     inc a
     cp (hl)
     jr z,$+5
     dec a
     jr FNPLoop
     inc l
     ld a,(hl)
     scf
     ret

I made that to be a faster alternative to using a bcall

Runer112 · « **Reply #50 on:** July 09, 2012, 02:45:26 pm »

Optimized a bit.

The largest optimization was removing end checking, because it's impossible for an application not to have a number of pages field. I also optimized the search loop by rearranging it a bit to remove the unconditional jump.

Code: [Select]

FindNumPages:
;Inputs:
;     The app base page is loaded in MemBank1
;Outputs:
;     A, (HL) is the number of app pages

     ld hl,4000h
     ld a,81h
     ld c,a
FNPLoop:
     dec a
     cpir
     inc a
     cp (hl)
     jr nz,FNPLoop
     inc l
     ld a,(hl)
     ret

thepenguin77 · « **Reply #51 on:** July 09, 2012, 02:46:04 pm »

There are actually rare cases where that routine could fail. Of course I would assume it will work 99.9% of apps, if someone changed the order of the header and put the time stamp key in front of the number of pages, it could theoretically contain $80, $81.

But, now that I think about it, this is so rare that it will never happen.

calc84maniac · « **Reply #52 on:** July 09, 2012, 04:54:29 pm »

Isn't that routine searching for $80, $81 anyway?

Edit: Oh, I see what you're saying. The time stamp data could contain $80, $81.

NanoWar · « **Reply #53 on:** July 12, 2012, 06:02:41 pm »

Has anybody got a good rectangle function? It should use variable width by pixel, not byte... Here's my ugly code:

Code: [Select]

rectangle
	;inputs: l=Y, a=X, b=height, c=length
	;save coords & stuff
	h, a
	push	hl
	push	bc
	call	rectangle.calc ;below
	pop	bc
	pop	hl
rectangle.display
; inputs: h=X, l=Y, b=height
	ld	a, h
	ld	e, l
	ld	h, $00
	ld	d, h
	add	hl, de
	add	hl, de
	add	hl, hl
	add	hl, hl	;l*12
	ld	e, a
	srl	e
	srl	e
	srl	e	;x/8
	add	hl, de
	ld	de, gbuf
	add	hl, de
rectangle.display.loop:
	push	bc
	push	hl
		ld	a, (hl)
		ld	c, a
		ld	a, (rectangle.scanline1) ; somewhere in ram...
		xor	c
		ld	(hl), a
		inc	hl
		ld	a, (rectangle.scanline2)
		or	a
		jr	z, rectangle.display.noloop2
		ld	b, a
rectangle.display.loop2:
		ld	a, (hl)
		xor	$FF
		ld	(hl), a
		inc	hl
		djnz	rectangle.display.loop2
rectangle.display.noloop2:
		ld	a, (hl)
		ld	c, a
		ld	a, (rectangle.scanline3)
		xor	c
		ld	(hl), a
	pop	hl
	pop	bc
	ld	de, 12
	add	hl, de
	djnz	rectangle.display.loop
	ret

rectangle.calc
;inputs:
;	a = x
;	b = height
;	c = length
	ld	d, a
		ld	a, $FF
		ld	(rectangle.scanline1), a
		xor	a
		ld	(rectangle.scanline2), a
		ld	(rectangle.scanline3), a
	ld	a, d
	and 	7
	ld	d, a
		or	a
		jr	z, rectangle.skipShift1
		ld	e, $FF
rectangle.shift1
		srl	e
		dec	a
		or	a
		jr	nz, rectangle.shift1
		ld	a, e
		ld	(rectangle.scanline1), a
rectangle.skipShift1
	ld	a, d	; a = shift right
	ld	h, a	; save
		add	a, c	; a + c
		ld	b, a	; save b = a + c
		and	7	; /8 Rest?
		ld	d, a	; Rest
		ld	a, 8
		sub	d	; 8 - Rest
		ld	d, a	; = d
		ld	e, $FF
rectangle.shift2
		sla	e
		dec	a
		or	a
		jr	nz, rectangle.shift2
		ld	a, e
		ld	(rectangle.scanline3), a
		ld	a, 16
	ld	e, h	; a
	sub	e	; 16 - a
	sub	d	; -d
	srl	a
	srl	a
	srl	a
	ld	d, a
	;
	ld	a, c
	srl	a
	srl	a
	srl	a	; /8
	sub	d
	ld	d, a
	ld	a, (rectangle.scanline2)
	add	a, d
	ld	(rectangle.scanline2), a
	;
	ld	a, b
	and	%11111000
	or	a	; if (shift_right + length)<8, do (rectangle.scanline1 & rectangle.scanline3)
	ret	nz
	ld	a, (rectangle.scanline1)
	ld	d, a
	ld	a, (rectangle.scanline3)
	and	d
	ld	(rectangle.scanline1), a
	xor	a
	ld	(rectangle.scanline2), a
	ld	(rectangle.scanline3), a
	ret

How bad is it?

chickendude · « **Reply #54 on:** July 15, 2012, 03:38:27 am »

You can check out the MirageOS source, but i think it just draws four lines using their line routine.

NanoWar · « **Reply #55 on:** July 18, 2012, 03:59:28 am »

Oh it should be a filled rect routine

.

Hayleia · « **Reply #56 on:** July 18, 2012, 05:00:08 am »

There is a filled rectangle routine in Axe. I don't know where you can see its source code though.

deeph · « **Reply #57 on:** July 18, 2012, 05:49:18 am »

Here's one I use for one of my project :

Code: [Select]

;=======================================;
; Rectangle Filling Routine Version 1.0 ;
; By Jason Kovacs & The TCPA - 10/11/99 ;
;=======================================;
; Input:  D = Top Left X Coordinate, E = Top Left Y Coordinate
;	 H = Bottom Right X Coord,  L = Bottom Right Y Coord
;	 C = Color of Lines (0-White, 1-Black, 2-XORed)
;
; Output: A Rectangle is drawn to the Graph Buffer with its border
;    and everything within it Filled in according to the value in
;    reg C which specifies the Color.
;
; Registers Affected: AF Destroyed; B=0 ; C, DE, HL Preserved.
;    The Index Registers and the Shadow Registers Aren't Used.

Rectangle_Filled:
	ld a,l
	sub e
	inc a
	ld b,a
	ld a,h
	sub d
	inc a   
	push de

Rect_Fill_Loop:
	push af
	call V_Line
	pop af
	inc d
	dec a
	jr nz, Rect_Fill_Loop
	pop de
	ret

;=======================================;
; Horizontal and Verticle Line Routines ;
; By Jason Kovacs & The TCPA - 10/11/99 ;
;=======================================;

; For H_Line and V_Line:
;
; Input:  B = Length of Line (Number of Pixels)
;	 C = Color of Line (0-White, 1-Black, 2-XORed)
;	 D = X Coordinate Start of the Line
;	 E = Y Coordinate Start of the Line
;
; Output: Lines are Drawn to the Graph Buffer, and the Starting
;    Byte and Pixel Mask are Automatically determined according
;    to the Input of the Coordinates in DE.
;
; Registers Affected:  All Registers are Preserved Except AF.

V_Line:
	push de
	push hl
	push bc
	ld a,d
	call Getpix
	pop bc
	push bc
	ld d,c
	ld c,a
	ld a,d
	ld de,12
	or a
	call z,V_White_Line
	dec a
	call z,V_Black_Line
	dec a
	call z,V_XORed_Line
	pop bc
	pop hl
	pop de
	ret

V_White_Line:
	ld a,c
	cpl
	ld c,a
	
V_White_Line_2:
	ld a,(hl)
	and c
	ld (hl),a
	add hl,de
	djnz V_White_Line_2
	xor a
	ret

V_Black_Line:
	ld a,(hl)
	or c
	ld (hl),a
	add hl,de
	djnz V_Black_Line
	xor a
	ret

V_XORed_Line:
	ld a,(hl)
	xor c
	ld (hl),a
	add hl,de
	djnz V_XORed_Line
	ret

Getpix:
	ld d,0
	ld h,d
	ld l,e
	add hl,de
	add hl,de
	add hl,hl
	add hl,hl
	ld de,plotsscreen
	add hl,de

Getbit:
	ld b,0
	ld c,a
	and %00000111
	srl c
	srl c
	srl c
	add hl,bc
	ld b,a
	inc b
	ld a,%00000001

GBLoop:
	rrca
	djnz GBLoop
	ret

NanoWar · « **Reply #58 on:** July 19, 2012, 07:42:36 pm »

Ah cool thanks. This was a bit off topic I guess

Xeda112358 · « **Reply #59 on:** July 20, 2012, 10:13:55 am »

Here is a routine that I started a long time ago (back when I coded only in hex). I have been too lazy to replace the hex code with mnemonics, but it can draw all sorts of rectangle types

I am sure it can be optimised, but it works:

(note, RectData needs to be 24 bytes of free ram. I typically use data in the op registers or cmdshadow, or something like that.)

Code: [Select]

DrawRectToGraph:
     ld hl,9340h
;===============================================================
DrawRectToBuffer:
DrawRect:
;===============================================================
;Inputs:
;     A is the type of rectangle to draw
;        0 =White
;        1 =Black
;        2 =XOR
;        3 =Black border
;        4 =White border
;        5 =XOR border
;        6 =Black border, white inside
;        7 =Black border, XOR inside
;        8 =White border, black inside
;        9 =White border, XOR inside
;        10=Shift Up
;        11=Shift Down
;
;
;        14=pxlTestRect  (returns the number of on pixels in the rectangular region)
;        15=pxlTestBorder (returns the number of on pixels on the border, good for collision detection)
;     B is the height
;     C is the Y pixel coordinate
;     D is the width in pixels
;     E is is the X pixel coordinate
;===============================================================
     di
     push hl
     pop ix
     ex af,af'
;Check if coords are negative
     ld a,c
     or a
     jp p,$+9
       add a,b
       ret nc
       ret z
       ld b,a
       ld c,0

     ld a,e
     or a
     jp p,$+9
       add a,d
       ret nc
       ret z
       ld d,a
       ld e,0
;Check dimensions
     ld a,b
     or a
     ret z
     jp p,$+6
       neg
       ld b,a
     add a,c
     sub 64
     jr c,$+6
       neg
       add a,b
       ld b,a

     ld a,d
     or a
     ret z
     jp p,$+6
       neg
       ld d,a
     add a,e
     sub 96
     jr c,$+6
       neg
       add a,d
       ld d,a
     ld a,c
     cp 64
     ret nc
     ld a,e
     cp 96
     ret nc
MakePattern:
     push bc
     ld hl,RectData
     ld b,24
     xor a
     ld (hl),a
     inc l
     djnz $-2
     ld hl,RectData
     ld c,RectData+12
     ld a,e
     sub 8
     jr c,$+6
       inc l
       inc c
       jr $-6
     add a,8
     ld e,a
     ld b,a
     inc b
     ld a,d
     add a,e
     ld e,a
     ld a,1
     rrca
     djnz $-1
     ld b,l
     push af
     ld l,c
     or (hl)
     ld (hl),a
     ld l,b
     pop af
     dec a
     scf
     adc a,a
     ld (hl),a
     ld a,e
     sub 8
     jr c,$+10
     jr z,$+10
       inc l
       ld (hl),-1
       inc c
       jr $-10
     add a,8

     ld b,a
     or a
     ld a,1
     jr z,$+5
     rrca
     djnz $-1

     ld b,l
     push af
     ld l,c
     or (hl)
     ld (hl),a
     ld l,b
     pop af
     dec a
     cpl
     and (hl)
     ld (hl),a
     pop bc
     ld a,b
     ld b,0
     ld h,b
     ld l,c
     add hl,hl
     add hl,bc
     add hl,hl
     add hl,hl
     push ix
     pop bc
     add hl,bc
     ld b,a
     ex af,af'
     .db $CB,$67,$28,$10,$D6,$10,$F5,$0E,$18,$11
     .dw RectData
     .db $1A,$2F,$12,$13,$0D,$20,$F9,$F1

     or a
     jr nz,$+13h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$2F,$A6,$77,$13,$23,$0D,$20,$F7,$10,$F0,$C9

     dec a
     jr nz,$+12h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$B6,$77,$13,$23,$0D,$20,$F8,$10,$F1,$C9

     dec a
     jr nz,$+12h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$AE,$77,$13,$23
     .db $0D,$20,$F8,$10,$F1,$C9

     dec a
     jr nz,$+26h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$B6,$77,$13,$23,$0D,$20,$F8,$05,$C8
     .db $05,$28,$0F,$0E,$0C,$11
     .dw RectData+12
     .db $1A,$B6,$77,$13,$23,$0D,$20,$F8,$10,$F1,$04,$18,$DC

     dec a
     jr nz,$+28h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$A6,$AE,$77,$13,$23,$0D,$20,$F7,$05,$C8,$05,$28,$10,$0E,$0C,$11
     .dw RectData+12
     .db $1A,$A6,$AE,$77,$13,$23,$0D,$20,$F7,$10,$F0,$04,$18,$DA

     dec a
     jr nz,$+26h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$AE
     .db $77,$13,$23,$0D,$20,$F8,$05,$C8,$05,$28,$0F,$0E,$0C,$11
     .dw RectData+12
     .db $1A,$AE,$77,$13,$23,$0D,$20,$F8
     .db $10,$F1,$04,$18,$DC

     dec a
     jr nz,$+36h
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$B6,$77,$13,$23,$0D,$20,$F8,$05,$C8,$05
     .db $28,$1F,$E5,$0E,$0C,$11
     .dw RectData
     .db $1A,$A6,$AE,$77,$13,$23,$0D,$20,$F7,$E1,$0E,$0C,$11
     .dw RectData+12
     .db $1A
     .db $B6,$77,$13,$23,$0D,$20,$F8,$10,$E1,$04,$18,$CC

     .db $3D,$20,$33,$0E,$0C,$11
     .dw RectData
     .db $1A,$B6,$77,$13
     .db $23,$0D,$20,$F8,$05,$C8,$05,$28,$1E,$E5,$0E,$0C,$11
     .dw RectData
     .db $1A,$AE,$77,$13,$23,$0D,$20,$F8,$E1
     .db $0E,$0C,$11
     .dw RectData+12
     .db $1A,$B6,$77,$13,$23,$0D,$20,$F8,$10,$E2,$04,$18,$CD,$3D,$20,$34,$0E,$0C,$11
     .dw RectData
     .db $1A,$A6,$AE,$77,$13,$23,$0D,$20,$F7,$05,$C8,$05,$28,$1E,$E5,$0E,$0C,$11
     .dw RectData
     .db $1A,$B6
     .db $77,$13,$23,$0D,$20,$F8,$E1,$0E,$0C,$11
     .dw RectData+12
     .db $1A,$AE,$77,$13,$23,$0D,$20,$F8,$10,$E2,$04,$18
     .db $CC,$3D,$20,$35,$0E,$0C,$11
     .dw RectData
     .db $1A,$A6,$AE,$77,$13,$23,$0D,$20,$F7,$05,$C8,$05,$28,$1F,$E5
     .db $0E,$0C,$11
     .dw RectData
     .db $1A,$AE,$77,$13,$23,$0D,$20,$F8,$E1,$0E,$0C,$11
     .dw RectData+12
     .db $1A,$A6,$AE,$77,$13
     .db $23,$0D,$20,$F7,$10,$E1,$04,$18,$CB,$3D,$20,$37,$05,$C8,$F3,$E5,$D9,$01,$0C,$00,$E1,$09,$D9,$0E
     .db $0C,$11
     .dw RectData
     .db $D5,$D9,$D1,$D9,$1A,$2F,$A6,$D9,$47,$1A,$A6,$B0,$13,$23,$D9,$77,$13,$23,$0D,$20
     .db $EF,$10,$E4,$0E,$0C,$11
     .dw RectData
     .db $1A,$2F,$A6,$77,$13,$23,$0D,$20,$F7,$FB,$C9

     .db $3D,$20,$40,$F3,$C5
     .db $11,$0C,$00,$19,$10,$FD,$2B,$E5,$D9,$11,$F4,$FF,$E1,$19,$D9,$C1,$05,$C8,$0E,$0C,$11
     .dw RectData+11
     .db $D5
     .db $D9,$D1,$D9,$1A,$2F,$A6,$D9,$47,$1A,$A6,$B0,$1B,$2B,$D9,$77,$1B,$2B,$0D,$20,$EF,$10,$E4,$0E,$0C
     .db $11
     .dw RectData+11
     .db $1A,$2F,$A6,$77,$1B,$2B,$0D,$20,$F7,$FB,$C9

     dec a
     ret z
     dec a
     ret z
     exx
     ld de,0
     ld c,8
     exx
PxlTestRect:
     dec a
     jr nz,PxlTestBorder
     ld c,12
     ld de,RectData
PxlTstRectLoop:
       call PxlTestWithMask
       djnz PxlTstRectLoop-5
       exx
;DE contains the number of pixels
       ret
PxlTestBorder:
     dec a
     ret nz
     ld c,12
     ld de,RectData
     call PxlTestWithMask
     dec b
     jr z,PxlTestBorder-4
     dec b
     jr z,PxlTestBrdrEnd
     ld c,12
     ld de,RectData+12
PxlTstBrdrLoop:
       call PxlTestWithMask
       djnz PxlTstBrdrLoop-5
PxlTestBrdrEnd:
     ld de,RectData
     ld c,12
     call PxlTestWithMask
     exx
;DE contains the number of on pixels
       ret
PxlTestWithMask:
     ld a,(de)
     and (hl)
     exx
     ld b,c
     add a,a
     jr nc,$+3
     inc de
     jr z,$+4
     djnz $-6
     exx
     inc de
     inc hl
     dec c
     jr nz,PxlTestWithMask
     ret

Author Topic: ASM Optimized routines (Read 115684 times)

Xeda112358

Re: ASM Optimized routines

calc84maniac

Re: ASM Optimized routines

Xeda112358

Re: ASM Optimized routines

chickendude

Re: ASM Optimized routines

Xeda112358

Re: ASM Optimized routines

Runer112

Re: ASM Optimized routines

thepenguin77

Re: ASM Optimized routines

calc84maniac

Re: ASM Optimized routines

NanoWar

Re: ASM Optimized routines

chickendude

Re: ASM Optimized routines

NanoWar

Re: ASM Optimized routines

Hayleia

Re: ASM Optimized routines

deeph

Re: ASM Optimized routines

NanoWar

Re: ASM Optimized routines

Xeda112358

Re: ASM Optimized routines