Author Topic: ASM Optimized routines  (Read 108220 times)

0 Members and 1 Guest are viewing this topic.

Offline Galandros

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1140
  • Rating: +42/-10
    • View Profile
ASM Optimized routines
« on: February 28, 2010, 07:27:53 am »
There are some cools optimized routines around. Calcmaniac is the recordist in z80, probably. At least in calculators z80 forums is.

On to the code:
Code: [Select]
;calcmaniac84
cpHLDE:
 or a
 sbc hl,de
 add hl,de
 ret
;Important note: because the code is 3 bytes and a call is 3 bytes, just macro in:
;SPASM, TASM and BRASS compatible, I guess
#define cp_HLDE  or a \ sbc hl,de \ add hl,de

;- Reverse a
;input: Byte in A
;output: Reversed byte in A
;destroys B
;Clock cycles: 66
;Bytes: 18
;author: calcmaniac84
reversea:
ld b,a
rrca
rrca
xor b
and %10101010
xor b
ld b,a
rrca
rrca
rrca
rrca
xor b
and %01100110
xor b
rrca
ret

;reverse hl
;curiosity: a easy port of a common reverse A register is more efficient than tricky stuff
;calcmaniac84
;28 bytes and 104 cycles
ld a,l
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rr h
rla
rrca
ld l,a
ret

;calc84maniac
;in: a = ABCDEFGH
;out: hl= AABBCCDDEEFFGGHH
rrca
rra
rra
ld l,a
rra
sra l
rla
rr l
sra l
rra
rr l
sra l

rrca
rra
rra
ld h,a
rra
sra h
rla
rr h
sra h
rra
rr h
sra h
ret

Code: [Select]
;Galandros optimized routines
;try to beat me... maybe is possible...

;Displays A register content on screen in decimal ASCII number, using no addition memory
DispA:
ld c,-100
call Na1
ld c,-10
call Na1
ld c,-1
Na1: ld b,'0'-1
Na2: inc b
add a,c
jr c,Na2
sub c ;works as add 100/10/1
push af ;safer than ld c,a
ld a,b ;char is in b
CALL PUTCHAR ;plot a char. Replace with bcall(_PutC) or similar.
pop af ;safer than ld a,c
ret


;Note the following one is optimized for RPGs menus and the such, it is quite flexible. I am going to use in Lost Legends I ^^
;I started with one which used addition RAM for temporary storage (made by me, too), and optimized for size, speed and no extra memory use! ^.^
;the inc's and dec's were trick to debug -.-", the registers b and c are like counters and flags

;DispHL for games
;input: hl=num, d=row,e=col, c=number of algarisms to skip
;number of numbers' characters to display: 5 ; example: 65000
;output: hl displayed, with algarisms skiped and spaces for initial zeros
DispHL_games:
inc c
ld b,1 ;skip 0 flag
ld (CurRow),de
;Number in hl to decimal ASCII
;Thanks to z80 Bits
;inputs: hl = number to ASCII
;example: hl=300 outputs '  300'
;destroys: af, hl, de used
ld de,-10000
call Num1
ld de,-1000
call Num1
ld de,-100
call Num1
ld e,-10
call Num1
ld e,-1
Num1:
ld a,'0'-1
Num2: inc a
add hl,de
jr c,Num2
sbc hl,de
dec c ;c is skipping
jr nz,skipnum
inc c
djnz notcharnumzero
cp '0'
jr nz,notcharnumzero
leadingzero:
inc b
skipnum:
ld a,' '
notcharnumzero:
push bc
call PUTCHAR  ;bcall(_PutC) works, not sure if it preserves bc
pop bc
ret

PUTCHAR:
bcall(_PutC)
ret

;Example usage of DispHL_games to understand what I mean
Test2:
ld hl,60003
ld de,$0101
ld c,0
call DispHL_games
ld hl,60003
ld de,$0102
ld c,1
call DispHL_games
ret

Well, don't try to understand or optimize calcmaniac84 ones. j/k, trying to understand can be harsh (tip: have a good instruction set summary) but teaches some inner details of the z80 asm.
About mine, do your best.
Hobbing in calculator projects.

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: ASM Optimized routines
« Reply #1 on: February 28, 2010, 05:21:57 pm »
Here is a little optimization I use but haven't really seen around.  When you need a direct key press, you have to wait about 7 clock cycles between setting the port and reading it.  Most people just fill in the extra space with a waste instruction like this:

Code: [Select]
ld a,xx
out (1),a
ld a,(de)
in a,(1)
and yy
9 Bytes, 43 T-States.

You can actually use the waste instruction to do something useful.  It gives a slight speed increase.

Code: [Select]
ld a,xx
out (1),a
ld b,yy
in a,(1)
and b
9 Bytes, 40 T-States.
« Last Edit: February 28, 2010, 05:23:48 pm by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: ASM Optimized routines
« Reply #2 on: February 28, 2010, 08:12:27 pm »
Small and quick setup for IM 2 (this example sets up vector table at $9900 and interrupt jump at $9a9a, but values can be changed)
Code: [Select]
di
ld a,$99
ld bc,$0100
ld h,a
ld d,a
ld l,c
ld e,b
ld i,a
inc a
ld (hl),a
ldir
ld l,a
ld (hl),$c3
inc l
ld (hl),intvec & $ff
inc l
ld (hl),intvec >> 8
im 2
ei
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Galandros

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1140
  • Rating: +42/-10
    • View Profile
Re: ASM Optimized routines
« Reply #3 on: April 24, 2010, 12:12:44 pm »
I found this optimized routine around. It is as far optimized as z80 string copy can get.
Code: [Select]
;author: calcmaniac84, I think
;Copy zero terminated string at HL to DE.
StrCopy:
xor a
docopystr:
cp (hl)
ldi
jr nz,docopystr
ret

These are quite optimized. But may be is possible to optimize further. (speed and size) But it is not needed...
They shift a graphics buffer (optimized to 96x64) up or down by pixels passed in A register.
Code: [Select]
scroll_up:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
add a,a
add a,a
ld l,a
ld e,a
ld h,0
ld d,h
add hl,hl
add hl,de ; hl=a*12

push hl
ld de,768
ex de,hl
; carry is never set here if input is correct
; or a
sbc hl,de
ld b,h
ld c,l ; bc=768-12*a
ex de,hl
ld de,plotsscreen
add hl,de
ldir
;blank remaining area
ld h,d
ld l,e
inc de
ld (hl),$00
pop bc
dec bc ; bc=12*a-1
ldir
ret
;PSEUDO CODE
; ld hl,plotsscreen+12*a
; ld de,plotsscreen
; ld bc,768-12*a
; ldir
; ld h,d
; ld l,e
; ld (hl),$00
; inc de
; ld bc,12*a
; dec bc
; ldir
; ret



scroll_down:
#ifdef DEBUG
cp 64+1
call c,ErrorOverFlow
#endif
; a can be from 1 to 63
; a can be multiplied by 4
add a,a
add a,a ; a*4
ld l,a ; hl = a*4
ld e,a
xor a
ld h,a
ld d,a
add hl,hl ; hl = a*8
add hl,de ; hl = a*12
ld e,a ; de = 0

push hl ; a*12 will needed later
push hl ; 2 times
ex de,hl
;carry is never set here
; or a
sbc hl,de ; hl= -a*12, de=a*12
ld de,plotsscreen+767
add hl,de ; hl=plotsscreen+767-12*a
pop bc
push hl
ld hl,768+1
;carry always set
; or a
sbc hl,bc
ld b,h
ld c,l
pop hl
lddr
;blank remaining area
ld h,d
ld l,e
ld (hl),$00
dec de
pop bc
dec bc
lddr
ret

; ld hl,plotsscreen+767-12*a
; ld de,plotsscreen+767
; ld bc,768-12*a
; lddr
; or
; ld (hl),$00 ;; ld hl,plotsscreen
; ld h,d ;; ld (hl),$00
; ld l,e ;; ld de,hl+1
; dec de ;; ld bc,12*a-1
; ld bc,12*a-1 ;; ldir
; lddr ;; ret
; ret
« Last Edit: April 24, 2010, 12:15:14 pm by Galandros »
Hobbing in calculator projects.

Offline mapar007

  • LV7 Elite (Next: 700)
  • *******
  • Posts: 550
  • Rating: +28/-5
  • The Great Mata Mata
    • View Profile
Re: ASM Optimized routines
« Reply #4 on: April 25, 2010, 03:58:56 am »
Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k

Offline Galandros

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1140
  • Rating: +42/-10
    • View Profile
Re: ASM Optimized routines
« Reply #5 on: April 25, 2010, 05:04:47 am »
Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
Actually I am working on something like that. I am hand writing C functions in z80 assembly just for fun. :P I will share them when I finish.
After seeing Axe Parser, it seems that is possible doing a good C compiler for z80. And we have documentation on how to optimize z80 assembly to do a optimizer, check the WikiTI topic: http://wikiti.brandonw.net/index.php?title=Z80_Optimization.
« Last Edit: April 25, 2010, 05:14:53 am by Galandros »
Hobbing in calculator projects.

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55943
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: ASM Optimized routines
« Reply #6 on: April 25, 2010, 12:19:53 pm »
Very nice! I'll add these to my utils.z80 file that is included in all my app builds.

Anyone wanting to compile a stdlib.c and revive the tisdcc project? j/k
I think I remember this, it was Halifax from the old Omnimaga forums who worked on it, right? There was a thread about it somewhere

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: ASM Optimized routines
« Reply #7 on: April 29, 2010, 05:59:58 pm »
Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

  • Multiply by 128?
  • Signed division by any nontrivial constant, other than 2, including negative numbers?
  • Modulus with any constant that is not a power of 2?

I'm rewriting my math engine almost from scratch so I decided I would just optimize everything I could possibly conceive of at the same time.  These are the ones I'm having trouble finding.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: ASM Optimized routines
« Reply #8 on: April 29, 2010, 06:31:16 pm »
Seems pretty impossible to me.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: ASM Optimized routines
« Reply #9 on: April 29, 2010, 06:58:39 pm »
Okay, that's good.  I spent hours trying to optimize some of these using all the tricks I know.  That reassures me it was a wild goose chase.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline DJ Omnimaga

  • Clacualters are teh gr33t
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55943
  • Rating: +3154/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • Dream of Omnimaga Music
Re: ASM Optimized routines
« Reply #10 on: April 29, 2010, 07:01:08 pm »
Seems pretty impossible to me.
O.O

No way!

You're calc84god, you can do everything, even the impossible! (see TI-Boy SE/Project M/F-Zero)

j/k I can't wait to see what kind of optimizations there will be in the next versions of Axe

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: ASM Optimized routines
« Reply #11 on: April 29, 2010, 07:34:45 pm »
It's nothing big.  Mostly it just extend multiplication, modulus, and addition to higher powers of 2.  The big optimizations won't come for a long time unfortunately.  Functionality is more important right now.

By the way, is there a better way to display hl at the coordinates (xx,yy) than this?
Code: [Select]
B_CALL(_SetXXXXOP2)
B_CALL(_Op2ToOP1)
ld hl,$yyxx
ld (PenCol),hl
ld a,5
B_CALL(_DispOP1A)

Its seems really roundabout to me.  Is there a bcall I don't know about that does this automatically?
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calcdude84se

  • Needs Motivation
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2272
  • Rating: +78/-13
  • Wondering where their free time went...
    • View Profile
Re: ASM Optimized routines
« Reply #12 on: April 29, 2010, 07:57:10 pm »
yeah, there's _DispHL
so you're code would be:
Code: [Select]
push hl
ld hl,$yyxx
ld (PenCol),hl
pop hl
B_CALL(_DispHL)
Just be aware it's right-justified in 5 spaces. (Since $ffff is 5 decimal digits, 65535)
EDIT: oh, wait, that's pencol? so this code doesn't work then. Oops... :-[
« Last Edit: April 30, 2010, 05:49:37 pm by calcdude84se »
"People think computers will keep them from making mistakes. They're wrong. With computers you make mistakes faster."
-Adam Osborne
Spoiler For "PartesOS links":
I'll put it online when it does something.

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2912
  • Rating: +471/-17
    • View Profile
    • TI-Boy CE
Re: ASM Optimized routines
« Reply #13 on: April 29, 2010, 10:27:56 pm »
He's talking about graph screen display.
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Galandros

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1140
  • Rating: +42/-10
    • View Profile
Re: ASM Optimized routines
« Reply #14 on: April 30, 2010, 09:21:30 am »
Quigibo's Challenge!

Can any of the following be done in 6 or fewer bytes?  The input and output must be HL.

  • Multiply by 128?
  • Signed division by any nontrivial constant, other than 2, including negative numbers?
  • Modulus with any constant that is not a power of 2?
Challenge accepted.

Answer to the multiplication by 128 in 6 bytes:

I started coding a routine that multiply A by 128:
Spoiler For Spoiler:
; The old trick to multiply by 256, by moving the low byte to high byte
 ld h,a
 xor a   ; resets carry
 rr h     ; divide h by 2
 rra      ; and pass bit 0 to a
 ld l,a   ; store to l
; hl is a*128

After that, I very easily modified to (hl*128)%((2^16)-1). Unsigned version:
Spoiler For Spoiler:
ld h,l
 xor a
 rr h
 rra
 ld l,a
; 6 bytes and 24 clocks to multiply hl by 128, not bad O_o

I am very sure this routines works but I have not tested.
EDIT4: tested with a few values, it works.

EDIT3:
Multiply hl by 128, now signed. If I am right, to do signed, you only need to preserve the bit 7? If that's so:
Spoiler For Spoiler:
ld h,l
 xor a
 sra h
 rra
 ld l,a
; 6 bytes, 24 clocks, too

Now I will think about the others when I have more free time. Fun, fun, fun.
Give me some time, please. :)
EDIT: I am thinking in putting some of this challenges in WikiTI when we end the challenge. And maybe Axe's routines. If you have other routines/challenges of optimization share to see what I can do.
EDIT2: fixed a bug/typo and commented even more the code
« Last Edit: April 30, 2010, 01:18:05 pm by Galandros »
Hobbing in calculator projects.