Author Topic: 24 bit multiplication  (Read 17680 times)

0 Members and 1 Guest are viewing this topic.

Offline ACagliano

  • LV8 Addict (Next: 1000)
  • ********
  • Posts: 919
  • Rating: +32/-2
    • View Profile
    • ClrHome Productions
Re: 24 bit multiplication
« Reply #30 on: December 11, 2011, 01:41:47 pm »
Ok. I am particularly interested now in 2-byte multiplication and 4-byte square rooting. How would they be done?

Offline jacobly

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 205
  • Rating: +161/-1
    • View Profile
Re: 24 bit multiplication
« Reply #31 on: December 11, 2011, 02:06:55 pm »
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Offline ACagliano

  • LV8 Addict (Next: 1000)
  • ********
  • Posts: 919
  • Rating: +32/-2
    • View Profile
    • ClrHome Productions
Re: 24 bit multiplication
« Reply #32 on: December 11, 2011, 02:23:11 pm »
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: 24 bit multiplication
« Reply #33 on: December 11, 2011, 02:30:16 pm »
So with two-byte multiplication, you can take advantage of the fact that add hl,hl is the same as shifting hl left. It even gives you the carry! So in this case:
Code: [Select]
     ld hl,0
     ld a,16
MultLoop:
     add hl,hl      ;shifts hl left
     rl e \ rl d    ;shifts de left and if hl overflowed, it overflows into de
     jr nc,$+6      ;if the bit in DE is o, skip this chunk
       add hl,bc    ;add bc to hl (think of this as the first number)
       jr nc,$+3    ;overflow into de
         inc de
     dec a
     jr nz,MultLoop
     ret
That will multiply DE times BC and return the result in DEHL. I will see if I can port a square root routine for 32-bit...

EDIT: changed inc e to inc de

Offline jacobly

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 205
  • Rating: +161/-1
    • View Profile
Re: 24 bit multiplication
« Reply #34 on: December 11, 2011, 02:48:20 pm »
Code: [Select]
// Multiply a times b
temp = 0
repeat for each bit in a
temp <<= 1
if (high bit of a set)
temp += b
a <<= 1
return temp
if a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.
Spoiler For for code:
stolen from Axe
p_MulFull:
   ; Input in hl and de, result in cahl
   ld   c,h
   ld   a,l
   ld   hl,0   ;11
   ld   b,16   ;7
__MulFullNext:
   add   hl,hl   ;11
   rla      ;4
   rl   c   ;8
   jr   nc,__MulFullSkip   ;12/7
   add   hl,de   ;11
   adc   a,0   ;7
   jr   nc,__MulFullSkip
   inc   c
__MulFullSkip:
   djnz   __MulFullNext
   ret
__MulFullEnd:

Code: [Select]
// Sqrt a
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
test = b << 8 + 0x40
b <<= 1
if (temp >= test)
temp -= test
set low bit of b
temp += high 2 bits of a
a <<= 2
return b
If a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.
Spoiler For code:
stole my own routine from axe (and modified it)
p_Sqrt88:
   ; input in hlde, result in de
   ld   b,16
   ld   a,h
   ld   c,l
   push   de ; ld ixh,d
   pop   ix ; ld ixl,e
   ld   de,0
   ld   h,d
   ld   l,e
__Sqrt88Loop:
   sub   $40
   sbc   hl,de
   jr   nc,__Sqrt88Skip
   add   a,$40
   adc   hl,de
__Sqrt88Skip:
   ccf
   rl   e
   rl   d
   add   ix,ix
   rl   c
   rla
   adc   hl,hl
   add   ix,ix
   rl   c
   rla
   adc    hl,hl
   djnz   __Sqrt88Loop
   ret
__Sqrt88End:

Kool. Thanks. But, shouldn't the first one have two inputs?

Of course, hl and de, isn't that what I said ;)

Offline FloppusMaximus

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 290
  • Rating: +57/-5
    • View Profile
Re: 24 bit multiplication
« Reply #35 on: December 11, 2011, 04:41:31 pm »
My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Oh boy, optimization time :D

The best I have so far is somewhere around 1800 cycles average (I'm too lazy to work out the exact probabilities at the moment, and not counting memory delays) using a squaring table and undocumented IX instructions.  Input is BDE and CHL, output is BCDEAL.  This routine works by expanding the formula 2xy = x²+y²-|x-y|², summed over each of the 9 pairs of bytes in the input.

(I'm not saying this is practical - unless you really have thousands of 24-bit multiplications to perform, you don't need this kind of speed.  This is just for fun.)
Code: [Select]
SUBFIRST .macro src1, src2, hdest, ldest
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ld a, ldest
sub (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

SUBNEXT .macro src1, src2, hdest, ldest
dec h
ex af, af'
exx
ld a, src1
sub src2
jr nc, $ + 4
neg
exx
ld l, a
ex af, af'
ld a, ldest
sbc a, (hl)
ld ldest, a
inc h
ld a, hdest
sbc a, (hl)
ld hdest, a
  .endm

BDE_times_CHL_sqrdiff_v3:
ld a, d
exx
ld h, high(sqrtab)
ld l, a
ld e, (hl)
inc h
ld d, (hl) ; DE = d²
exx
ld a, b
exx
ld l, a
ld b, (hl)
dec h
ld c, (hl) ; BC = b²
exx
ld a, e
exx
ld l, a
ld a, (hl)
inc h
ld h, (hl)
ld l, a ; HL = e²
call BC_DE_HL_times_10101
push bc
push hl
  push de
   exx
   ld a, h
   exx
   ld h, high(sqrtab)
   ld l, a
   ld e, (hl)
   inc h
   ld d, (hl) ; DE = h²
   exx
   ld a, c
   exx
   ld l, a
   ld b, (hl)
   dec h
   ld c, (hl) ; BC = c²
   exx
   ld a, l
   exx
   ld l, a
   ld a, (hl)
   inc h
   ld h, (hl)
   ld l, a ; HL = l²
   call BC_DE_HL_times_10101
   pop ix
  add ix, de
  pop de
adc hl, de
ex de, hl
pop hl
adc hl, bc
ld b, h
ld c, l ; BCDEIX = total
push af

ld h, high(sqrtab)
SUBFIRST e, l, ixh, ixl
SUBNEXT  d, h, d, e
SUBNEXT  b, c, b, c
jp nc, BDE_times_CHL_sqrdiff_v3_nc1
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc1:

inc b

dec h
SUBFIRST e, h, e, ixh
SUBNEXT  d, c, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc2
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc2
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc2:

dec h
SUBFIRST d, l, e, ixh
SUBNEXT  b, h, c, d
jr nc, BDE_times_CHL_sqrdiff_v3_nc3
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc3
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc3:

inc c

dec h
SUBFIRST b, l, d, e
jr nc, BDE_times_CHL_sqrdiff_v3_nc4
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc4
pop af
ccf
push af
BDE_times_CHL_sqrdiff_v3_nc4:

dec h
SUBFIRST e, c, d, e
pop hl
jr nc, BDE_times_CHL_sqrdiff_v3_nc5
dec c
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
dec b
jp nz, BDE_times_CHL_sqrdiff_v3_nc5
inc l
BDE_times_CHL_sqrdiff_v3_nc5:

dec b
dec c

rr l
rr b
rr c
rr d
rr e
ld a, ixl
ld l, a
ld a, ixh
rra
rr l
ret


BC_DE_HL_times_10101:
push bc
ld a, h
ex af, af'
sub a
ld c, a
ld b, l
add hl, bc
adc a, a
ld b, e
add hl, bc
adc a, c ; AHL = [ L+H+E L ]
pop bc
push hl
push bc
  ld c, a
  ld b, 0
  ex af, af'
  ld h, a
  add hl, bc ; no way this can carry (initial HL is a square)
  ld c, a
  ld b, e
  sub a
  add hl, bc
  adc a, a ; AHL(SP+2) = [ H+E L+H L+H+E L ]
  add hl, de
  adc a, 0 ; AHL(SP+2) = [ H+E+D L+H+E L+H+E L ]
  pop bc
add hl, bc
adc a, 0 ; AHL(SP) = [ H+E+D+B L+H+E+C L+H+E L ]
ld e, d
ld d, c
add hl, de
adc a, b
jr nc, BC_DE_HL_times_10101_nc1
inc b ; BAHL(SP) = [ B B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc1:
add a, e
jr nc, BC_DE_HL_times_10101_nc2
inc b ; BAHL(SP) = [ B D+B H+E+D+C+B L+H+E+D+C L+H+E L ]
BC_DE_HL_times_10101_nc2:
pop de
add a, c
ld c, a
ret nc
inc b ; BCHLDE = [ B D+C+B H+E+D+C+B L+H+E+D+C L+H+E L ]
ret

To get back to the topic somewhat, ACagliano, it sounds like you're more interested in squaring than in general multiplication.  Squaring can be considerably faster, especially if you use a lookup table (e.g., my best 16-bit squaring routine is around 170 cycles, versus around 800 for general multiplication.)

Offline cerzus69

  • LV2 Member (Next: 40)
  • **
  • Posts: 27
  • Rating: +6/-0
    • View Profile
Re: 24 bit multiplication
« Reply #36 on: December 12, 2011, 10:43:43 am »
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.
« Last Edit: December 12, 2011, 10:51:09 am by cerzus69 »

Offline ACagliano

  • LV8 Addict (Next: 1000)
  • ********
  • Posts: 919
  • Rating: +32/-2
    • View Profile
    • ClrHome Productions
Re: 24 bit multiplication
« Reply #37 on: December 12, 2011, 02:35:39 pm »
Yeah, all I need is 16-bit subtraction (which 'sub' supports, I think), 16-bit squaring, 32-bit addition, then 32-bit square rooting (or will I need to go up to 40-bit?).

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: 24 bit multiplication
« Reply #38 on: December 12, 2011, 02:48:12 pm »
16-bit subtraction
Code: [Select]
or a     ;to make sure the c flag is reset. Not always necessary if you know the c flag will be reset
sbc hl,bc  ;you can do sbc hl,de also.
32-bit addition (you mean two 32-bit inputs?)
Code: [Select]
;Inputs:
;     HLBC is one of the 32-bit inputs
;     DE points to the other 32-bit input in RAM
;Outputs:
;     HLBC is the 32-bit result
;     DE is incremented 3 times
;     A=H
;     c flag is set if there is an overflow
     ld a,(de) \ inc de
     add a,c \ ld c,a
     ld a,(de) \ inc de
     adc a,b \ ld b,a
     ld a,(de) \ inc de
     adc a,l \ ld l,a
     ld a,(de)
     adc a,h \ ld h,a
     ret
Squaring and square rooting... I will think on it D:

Also, I am working on a mini math library that will include RAM based math (so all the values will be in RAM). It seems like a few of these commands will need to rely on some memory. If they do, I suggest using the OP registers (11 bytes of RAM each).

Offline jacobly

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 205
  • Rating: +161/-1
    • View Profile
Re: 24 bit multiplication
« Reply #39 on: December 12, 2011, 07:07:45 pm »
I do have a 24-bit floating-point multiplication routine ;D

saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
rl c
jr nc,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Jacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.

That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Offline cerzus69

  • LV2 Member (Next: 40)
  • **
  • Posts: 27
  • Rating: +6/-0
    • View Profile
Re: 24 bit multiplication
« Reply #40 on: December 13, 2011, 11:06:38 am »
That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. :/
Anyway, my new test program seems to agree with this change. :)
Code: [Select]
; hldebc = hlc * bde
ld (iy+asm_Flag1),b
xor a
ld ix,0
ld b,24
Loop:
add ix,ix
rla
rl c
adc hl,hl
jr nc,Next
add ix,de
adc a,(iy+asm_Flag1)
jr nc,Next
inc c
jr nz,Next
inc hl
Next:
djnz Loop
ld e,a
ld d,c
push ix ; ld c,ixl
pop bc ; ld b,ixh

Cool, thanks a lot, indeed it works now! :D