Author Topic: 24 bit multiplication (Read 19051 times)

ACagliano · « **Reply #15 on:** December 08, 2011, 03:11:20 pm »

Quote from: Xeda112358 on December 08, 2011, 11:36:34 am

Well I am almost positive that using my algorithm would require the use of RAM, even if you used all the shadow registers, too.

Can you help me, conceptually with doing math higher than 2 bytes? Maybe I'll write it myself.

Xeda112358 · « **Reply #16 on:** December 08, 2011, 03:54:25 pm »

Okay, here is some pseudo code for an algorithm that will always work and is fast (each cycle gives the next digit):

Code: [Select]

     0→A
     0→D          ;This is where the result will go
     0→C          ;This is a carry
     For(B,1,12)  ;12 is half the number of input bits
     D+D→D        ;We want to shift D left
     D+D+1→C      ;The difference of consecutive squares?
     A<<E         ;rotate E left, then rotate the carry into A
     A<<E         ;In z80, "rlc e \ rla"
     If A>=C      ;If A is greater than or equal to C
       D+1→D
       A-C→A
     End
;You will need an input       (24 bits in the above example)
;You will need an output      (12 bits in the above example)
;You will need a counter      (4 bits in the above example)
;You will need an accumulator (24 bits in the above example)
;You will need a carry        (13 bits in the above example)
;
;You will need at least       (77 bits in the above example)

So for a simple 8-bit implementation in Z80 code:

Code: [Select]

;===============================================================
sqrtE:
;===============================================================
;Input:
;     E is the value to find the square root of
;Outputs:
;     A is E-D^2
;     B is 0
;     D is the result
;     E is not changed
;     HL is not changed
;Destroys:
;     C=2D+1 if D is even, 2D-1 if D is odd

        xor a               ;1      4         4
        ld d,a              ;1      4         4
        ld c,a              ;1      4         4
        ld b,4              ;2      7         7
sqrtELoop:
        rlc d               ;2      8        32
        ld c,d              ;1      4        16
        scf                 ;1      4        16
        rl c                ;2      8        32

        rlc e               ;2      8        32
        rla                 ;1      4        16
        rlc e               ;2      8        32
        rla                 ;1      4        16

        cp c                ;1      4        16
        jr c,$+4            ;4    12|15      48+3x
          inc d             ;--    --        --
          sub c             ;--    --        --
        djnz sqrtELoop      ;2    13|8       47
        ret                 ;1     10        10

If you want to refine the accuracy to round to the nearest integer, you can add this code to the end of the Z80 code:

Code: [Select]

        cp d                ;1      4         4
        jr c,$+3            ;3    12|11     12|11
          inc d             ;--    --        --

It takes advantage of the linear differences of a quadratic (meaning (a(x+1)²+b(x+1)+c)-(ax²+bx+c) is a lnear equation)

Anywho, I am being told I need to finish this up, so I hope this helps until I can find time later >.>

ACagliano · « **Reply #17 on:** December 08, 2011, 03:57:06 pm »

Ok, I'm confused. I'll leave this to the more advanced programmers for now.

thepenguin77 · « **Reply #18 on:** December 08, 2011, 07:40:32 pm »

Edit2:
Dang, I need to read closer. Oh well, at least mine is way bigger than jacobly's.

These things actually aren't that difficult to write if you understand the principle by which they work.

Code: [Select]


multBCDbyEHL:
	ld	a, b
	ld	b, c
	ld	c, d
	ld	d, a
multDBCbyEHL:
	
	push	de
	push	hl
	
	ld	ix, $8000
	ld	(ix), l
	xor	a
	ld	h, a
	ld	l, a

	call	do8Bits

	ld	(ix+1), a
	pop	af
	ld	(ix), a		
	push	de
	ld	a, (ix+1)

	call	do8Bits

	ld	(ix+1), a
	pop	af		;least sig number
	ex	de, hl
	ex	(sp), hl	;D and 2nd least sig in for new number
	ld	(ix), l
	pop	hl		;D and 2nd least sig
	push	af		;least sig number
	push	hl		;2nd least sig number
	ex	de, hl
	ld	a, (ix+1)

	call	do8Bits

	ld	b, a
	ld	c, h
	ld	d, l
	pop	hl
	ld	a, l
	pop	hl
	ld	h, a
	ret

	
	


;####
;input: DBC = 1 number
;	AHL = running number
;	(ix) = to multiply by
;output: AHLE = output
;	E is done
;	DBC = 1 number


do8Bits:
	ld	(ix+1), 8
loop:
	srl	(ix)
	jr	nc, skip

	add	hl, bc
	adc	a, d
skip:
	rra	
	rr	h
	rr	l
	rr	e
	dec	(ix+1)
	jr	nz, loop
	ret

I had to use 2 bytes of memory, so sorry about that. If you don't like it, you could just replace (ix) with ixl and (ix+1) with ixh and it would be ram independent, but then it wouldn't run on the Nspire.

Also, to my extreme delight, this thing worked on the first try

(The result is in BCDEHL)

Edit:
Added to wikiti so people will forever be able to multiply large numbers.

jacobly · « **Reply #19 on:** December 08, 2011, 08:35:58 pm »

My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
Edit: And yes, they are 24*24 bit integer multiplication routines (floating point routines are much more complicated

).

As for square root, I should be able to do it ~~using only af bc de hl ix (sp), or af bc de hl ix iy, just like the multiplication. Just give me a couple hours to port it to 48-bit.~~

Edit:
uses af bc de hl ix iy (for two bytes of memory)
84 bytes, 9186-9258 cycles

Code: [Select]

	; ahl = sqrt(hldebc)
	push	bc ; ld c,ixl
	pop	ix ; ld b,ixh
	push	de
	ld	c,l
	ld	a,h
	ld	hl,0
	ld	b,h
	ld	e,l
	ld	d,h
	ld	(iy+asm_Flag1),d
	ld	(iy+asm_Flag2),24
Loop:
	cp	$40
	push	af
	sbc	hl,de
	ld	a,b
	sbc	a,(iy+asm_Flag1)
	jr	c,Restore
	ld	b,a
	pop	af
	sub	$40
	scf
	jr	Skip
Restore:
	pop	af
	adc	hl,de
	or	a
Skip:
	rl	e
	rl	d
	rl	(iy+asm_Flag1)
	add	ix,ix
	ex	(sp),hl
	adc	hl,hl
	ex	(sp),hl
	rl	c
	rla
	adc	hl,hl
	rl	b
	add	ix,ix
	ex	(sp),hl
	adc	hl,hl
	ex	(sp),hl
	rl	c
	rla
	adc	hl,hl
	rl	b
	dec	(iy+asm_Flag2)
	jr	nz,Loop
	pop	hl
	ld	a,(iy+asm_Flag1)
	ret

uses af bc b' de de' hl hl' ix
74 bytes, 5985-6777 cycles

Code: [Select]

	; dea = sqrt(hldebc)
	di
	push	bc ; ld c,ixl
	pop	ix ; ld b,ixh
	ld	bc,$40
	ld	a,l
	ld	l,h
	ld	h,b
	exx
	ld	de,0
	ld	l,e
	ld	h,d
	ld	b,24
	or	a
Loop:
	exx
	sbc	hl,bc
	exx
	sbc	hl,de
	jr	nc,Skip
	exx
	add	hl,bc
	exx
	adc	hl,de
Skip:
	exx
	ccf
	rl	b
	exx
	rl	e
	rl	d
	exx
	add	ix,ix
	rl	e
	rl	d
	rla
	adc	hl,hl
	exx
	adc	hl,hl
	exx
	add	ix,ix
	rl	e
	rl	d
	rla
	adc	hl,hl
	exx
	adc	hl,hl
	djnz	Loop
	exx
	ld	a,b
	exx
	ei
	ret

Xeda112358 · « **Reply #20 on:** December 09, 2011, 07:39:22 pm »

Okay, so this isn't too helpful, but it is an idea: I came up with a very simple algorithm that gets very accurate very fast (12 cycles for 16-bit numbers). On the Z80 it isn't that fast since it requires multiplication and division, but the algorithm looks like this:

Code: [Select]

B/2→A             ;start with anything, really
For(C,1,12
(A+B/A)/2→A
End

To implement this in assembly code, you will want to multiply the input by 4 (shifting left twice) and then divide the output by 2. Then if you want added precision, if there is a carry, increment by 1. Again, in z80, this algorithm isn't really worth it since it isn't as speed efficient as other methods, but it can be easier to grasp

Plus, if you move on to coding for devices with the ability to multiply and divide, you will have a really tiny algorithm that uses a counter, input, and output as opposed to a bajillion registers.

jacobly · « **Reply #21 on:** December 10, 2011, 02:45:24 am »

Actually, my routine only uses about 3 virtual registers. The only reason it uses so many z80 registers is because the virtual registers are so big. In fact, the first routine uses the same number of bits as your's would (don't forget that A and B in your routine would each be 48 bits wide). However, your routine does have the advantage of using similarly sized registers (and fewer iterations), so it probably would be more useful on other processors.

cerzus69 · « **Reply #22 on:** December 10, 2011, 08:26:06 am »

Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.

ACagliano · « **Reply #23 on:** December 10, 2011, 09:24:34 am »

Ok, I'm actually going to be sticking with just two bytes for position and scrapping the sector. That means I'll need to do 2-byte subtraction (no problem), square a 2-byte (rendering a max 4-byte), then add three 4-bytes (do you only need a 4-byte output or should you go up to 5), then square root that. That should be easier.

√( (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2)

√( (16bit-16bit)^2 + (16bit-16bit)^2 + (16bit-16bit)^2)

Can someone explain to me the theory behind multiplying/dividing/sqrt'ing numbers? Is there a standard theorem, regardless of the bit-size of the number?

Xeda112358 · « **Reply #24 on:** December 10, 2011, 12:25:16 pm »

Okay, so multiplying and dividing you probably actually know-- you just don't know it. So you remember in grade school learning how to multiply large numbers:

Code: [Select]

  432
  x27
-----

Well that is actually the best method and fastest. What you do is multiply 2*432. THen you shift the digits left and throw a zero at the end. Then you multiply 432*7 and add the two values together. Voila, multiplication

Now in binary, math is almost always easier. Because you are using 0 and 1, multiplication is easy:

Code: [Select]

11000101
11010111

so you take the last digit of the second number and you get 1. Multiply 1 times the first and you get 11000101. Now shift that left and check the next bit of the bottom number. If that is 1, add it to the accumulator (the accumulator in this case is the running total, not necessarily the a register).
So here is what it looks like in assembly code:

Code: [Select]

;D*E
     xor a       ;This is the accumulator
     ld b,8      ;this is the counter
MultLoop:
     add a,a     ;this is to rotate the accumulator left. Use rlca, too
     rlc e       ;This puts the next bit in E in the carry
     jr nc,$+3   ;If the bit in E was not 1, you add 0 (so don't add!)
       add a,d   ;1*D=D, so add D. Binary makes math easy.
     djnz MultLoop
     ret

When it comes to division in z80, you are pretty much just doing long division. Finding the square root, however, is something you probably didn't do in school since we have calculators and it was taken out of the school curriculum (I sound old

). Anywho, I learned how to do it in decimal from a textbook from the 1950's and then I extended it to binary since it is much, much easier in binary. It is difficult to explain in a post (you kind of need to see it to understand it), but here is an excellent site:
http://www.homeschoolmath.net/teaching/square-root-algorithm.php

The second method is what you will want to use as it gives each consecutive digit every cycle. If you can understand that well, you will probably see why it is so much easier in binary and how to create a z80 algorithm

jacobly · « **Reply #25 on:** December 10, 2011, 06:13:47 pm »

Quote from: cerzus69 on December 10, 2011, 08:26:06 am

Hey, jacobly, I tested both of your multiplication routines and it doesn't seem like they're doing the same thing... I've tried both of them in my program but they have different results.

They both have different inputs.

The first is hla * cde, and the second is hlc * bde.
You probably want to add some code to the beginning of each so that the input works better for what you are doing.
For example:
ld a,c ld c,b
at the beginning of the first routine causes them to have the same input (hlc and bde).

Edit: Here's some pseudo code that might help.

Code: [Select]

// Multiply a times b
temp = 0
repeat for each bit in a
	temp <<= 1
	if (high bit of a set)
		temp += b
	a <<= 1
return temp

// Divide a by b
temp = 0
repeat for each bit in a
	temp <<= 1
	temp += high bit of a
	a <<= 1
	if (temp >= b)
		temp -= b
		set low bit of a
return a

// Sqrt a
temp = 0
b = 0
repeat for every 2 bits in a
	temp += high 2 bits of a
	a <<= 2
	test = b << 2 + 1
	b <<= 1
	if (temp >= test)
		temp -= test
		set low bit of b
return b
// Sqrt a, sometimes better with multiple-of-a-byte registers
temp = high byte of a
a <<= 8
b = 0
repeat for every 2 bits in a
	test = b << 8 + 0x40
	b <<= 1
	if (temp >= test)
		temp -= test
		set low bit of b
	temp += high 2 bits of a
	a <<= 2
return b

The tricky part is figuring out how many bits are in each variable and allocating the z80 registers accordingly.

ACagliano · « **Reply #26 on:** December 11, 2011, 11:31:56 am »

Ok, so let me just get something straight...

00100110
x11011001
-------------
1. 00100110
2. 0,01001100
3. 00,10011000
4. 001,00110001
5. 0010,01100011
6. 00100,11000110
7. 001001,10001101
8. 0010011,00011011

Thus, the answer is %00010011 %00011011

Now, let's check: 38 x 217 = 4891 XX it's wrong??

Xeda112358 · « **Reply #27 on:** December 11, 2011, 11:55:42 am »

Hmm, here is the algorithm in decimal:

Code: [Select]

 333
x471
----
1) 4*333
   +Acc  = 1332
   Acc*10= 13320
2) 7*333
   +Acc  = 15651
   Acc*10= 156510
3) 1*333
   +Acc  = 156843

Here is the algorithm in binary:

Code: [Select]

 00100110
x11011001
---------------
1) 1x00100110
     +Acc = 00100110
     Acc*2= 01001100
2) 1x00100110
     +Acc = 01001100+00100110=01110010
     Acc*2= 11100100
3) 0x00100110
     +Acc = 11100100
     Acc*2= 111001000
4) 1x00100110
     +Acc = 111101110
     Acc*2= 1111011100
5) 1x00100110
     +Acc = 10000000010
     Acc*2= 100000000100
6) 0x00100110
     +Acc = 100000000100
     Acc*2= 1000000001000
7) 0x00100110
     +Acc = 1000000001000
     Acc*2= 10000000010000
8) 1x00100110
     +Acc = 10000000110110

ACagliano · « **Reply #28 on:** December 11, 2011, 12:47:33 pm »

Ok, I think I get it.

You move through the second register. If the current bit is 1, you add 1 x a to a, then rla a. If the current bit is 0, just rla a.

Xeda112358 · « **Reply #29 on:** December 11, 2011, 01:07:43 pm »

Yes

Typically you do rla, then check the next bit and if it is set, add the other register

The rla at the beginning works because a starts at 0 so 0*0=0.

Author Topic: 24 bit multiplication (Read 19051 times)

ACagliano

Re: 24 bit multiplication

Xeda112358

Re: 24 bit multiplication

ACagliano

Re: 24 bit multiplication

thepenguin77

Re: 24 bit multiplication

jacobly

Re: 24 bit multiplication

Xeda112358

Re: 24 bit multiplication

jacobly

Re: 24 bit multiplication

cerzus69

Re: 24 bit multiplication

ACagliano

Re: 24 bit multiplication

Xeda112358

Re: 24 bit multiplication

jacobly

Re: 24 bit multiplication

ACagliano

Re: 24 bit multiplication

Xeda112358

Re: 24 bit multiplication

ACagliano

Re: 24 bit multiplication

Xeda112358

Re: 24 bit multiplication