I was trying to make an HL_mod_10 routine today employing some of the trickery I used with my Rand24 routine (24-bit pseudo-random number generator). I wanted to make it as fast as I could, so I was wondering if anybody wanted to have a fun game of optimise-the-short-assembly-routine.
My approach was to break HL into two parts as H*256+L. Mod 10, that is equivalent to:
(H*260-H*4+L) mod 10=(L-H*4) mod 10
H and L are 8-bit values, so H*4 has the potential of being a 10 bit number. If I call this X*256+Y where X is the upper 2 bits of H*4:
(L-H*4) mod 10 = (L-X*256-Y) mod 10
(L-X*256-Y) mod 10 =(L-(X*260-X*4)-Y) mod 10
(L-(X*260-X*4)-Y) mod 10=(L-Y+X*4) mod 10
Now that is an 8-bit subtraction with L and Y. If there is 8-bit 'overflow' with the subtraction, then I need to adjust it by adding a constant. Then, X*4 is at most a 4 bit number to add to that. Once that is all done, you can perform mod 10 stuff on an 8-bit number. I came up with this code:
HL_mod_10
;Input: HL
;Output: (HL_mod_10 -> A)
;197.75 avg cycles
ld a,l
ld l,h
ld h,0
add hl,hl
add hl,hl
sub l \ jr nc,$+8 \ add a,4 \ jr nc,$+4 \ add a,6
sla h \ sla h
add a,h \ jr nc,$+8 \ sub 4 \ jr nc,$+4 \ sub 6
sub 160 \ jr nc,$+4 \ add a,160
sub 80 \ jr nc,$+4 \ add a,80
sub 40 \ jr nc,$+4 \ add a,40
sub 20 \ jr nc,$+4 \ add a,20
sub 10 \ ret nc \ add a,10 \ ret
It averages a smidge under 197.75 t-states.
EDIT: If anybody wants to modify this for other languages, here is a supplemental explanation:
This is the equivalent of HL mod 10:
(L-Y+X*4) mod 10
If the input bits of HL are abcdefghijklmnop2, then:
L=ijklmnop2
Y=cdefgh002
X=000000ab2
So to put it into an 8-bit value to operate on:
ijklmnop2-cdefgh002+0000ab002
If this is >256 (it will be at most 267), then add 6 and keep the lower 8 bits.
If this is <0 (it will be at the lowest -255) then take the lower 8-bits (signed, two's complement form), add 4. If that overflows to be >256, then add another 6, keeping only the lower 8 bits.
So for example, if ijklmnop2-cdefgh002+0000ab002=-3 → FD+04→10116→01+06→07.
Now perform mod 10 on the 8-bit result. You can apply more tricks to this if you like. abcdefgh2=128a+0bcdefgh2, so if a=1, you can do 0bcdefgh2-2, for example. Or:
0000ab002+00cdefgh2
Or:
0000abc02+000defgh2