0 Members and 7 Guests are viewing this topic.
// Multiply a times btemp = 0repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1return temp
// Sqrt atemp = high byte of aa <<= 8b = 0repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2return b
Code: [Select]// Multiply a times btemp = 0repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.Spoiler For for code: stolen from Axep_MulFull: ; Input in hl, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7__MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c__MulFullSkip: djnz __MulFullNext ret__MulFullEnd:Code: [Select]// Sqrt atemp = high byte of aa <<= 8b = 0repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.Spoiler For code: stole my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e__Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de__Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret__Sqrt88End:
ld hl,0 ld a,16MultLoop: add hl,hl ;shifts hl left rl e \ rl d ;shifts de left and if hl overflowed, it overflows into de jr nc,$+6 ;if the bit in DE is o, skip this chunk add hl,bc ;add bc to hl (think of this as the first number) jr nc,$+3 ;overflow into de inc de dec a jr nz,MultLoop ret
Quote from: jacobly on December 11, 2011, 02:06:55 pmCode: [Select]// Multiply a times btemp = 0repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.Spoiler For for code: stolen from Axep_MulFull: ; Input in hl and de, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7__MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c__MulFullSkip: djnz __MulFullNext ret__MulFullEnd:Code: [Select]// Sqrt atemp = high byte of aa <<= 8b = 0repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.Spoiler For code: stole my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e__Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de__Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret__Sqrt88End:Kool. Thanks. But, shouldn't the first one have two inputs?
Code: [Select]// Multiply a times btemp = 0repeat for each bit in a temp <<= 1 if (high bit of a set) temp += b a <<= 1return tempif a and b are 2 bytes, temp is 4 bytes, and you loop 16 times.Spoiler For for code: stolen from Axep_MulFull: ; Input in hl and de, result in cahl ld c,h ld a,l ld hl,0 ;11 ld b,16 ;7__MulFullNext: add hl,hl ;11 rla ;4 rl c ;8 jr nc,__MulFullSkip ;12/7 add hl,de ;11 adc a,0 ;7 jr nc,__MulFullSkip inc c__MulFullSkip: djnz __MulFullNext ret__MulFullEnd:Code: [Select]// Sqrt atemp = high byte of aa <<= 8b = 0repeat for every 2 bits in a test = b << 8 + 0x40 b <<= 1 if (temp >= test) temp -= test set low bit of b temp += high 2 bits of a a <<= 2return bIf a is 4 bytes, then b and temp are 2 bytes, and you loop 16 times.Spoiler For code: stole my own routine from axe (and modified it)p_Sqrt88: ; input in hlde, result in de ld b,16 ld a,h ld c,l push de ; ld ixh,d pop ix ; ld ixl,e ld de,0 ld h,d ld l,e__Sqrt88Loop: sub $40 sbc hl,de jr nc,__Sqrt88Skip add a,$40 adc hl,de__Sqrt88Skip: ccf rl e rl d add ix,ix rl c rla adc hl,hl add ix,ix rl c rla adc hl,hl djnz __Sqrt88Loop ret__Sqrt88End:
My first multiplication routine takes 2746 - 4570 cycles, the second takes 1680 - 2880 cycles.
SUBFIRST .macro src1, src2, hdest, ldest exx ld a, src1 sub src2 jr nc, $ + 4 neg exx ld l, a ld a, ldest sub (hl) ld ldest, a inc h ld a, hdest sbc a, (hl) ld hdest, a .endmSUBNEXT .macro src1, src2, hdest, ldest dec h ex af, af' exx ld a, src1 sub src2 jr nc, $ + 4 neg exx ld l, a ex af, af' ld a, ldest sbc a, (hl) ld ldest, a inc h ld a, hdest sbc a, (hl) ld hdest, a .endmBDE_times_CHL_sqrdiff_v3: ld a, d exx ld h, high(sqrtab) ld l, a ld e, (hl) inc h ld d, (hl) ; DE = d² exx ld a, b exx ld l, a ld b, (hl) dec h ld c, (hl) ; BC = b² exx ld a, e exx ld l, a ld a, (hl) inc h ld h, (hl) ld l, a ; HL = e² call BC_DE_HL_times_10101 push bc push hl push de exx ld a, h exx ld h, high(sqrtab) ld l, a ld e, (hl) inc h ld d, (hl) ; DE = h² exx ld a, c exx ld l, a ld b, (hl) dec h ld c, (hl) ; BC = c² exx ld a, l exx ld l, a ld a, (hl) inc h ld h, (hl) ld l, a ; HL = l² call BC_DE_HL_times_10101 pop ix add ix, de pop de adc hl, de ex de, hl pop hl adc hl, bc ld b, h ld c, l ; BCDEIX = total push af ld h, high(sqrtab) SUBFIRST e, l, ixh, ixl SUBNEXT d, h, d, e SUBNEXT b, c, b, c jp nc, BDE_times_CHL_sqrdiff_v3_nc1 pop af ccf push afBDE_times_CHL_sqrdiff_v3_nc1: inc b dec h SUBFIRST e, h, e, ixh SUBNEXT d, c, c, d jr nc, BDE_times_CHL_sqrdiff_v3_nc2 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc2 pop af ccf push afBDE_times_CHL_sqrdiff_v3_nc2: dec h SUBFIRST d, l, e, ixh SUBNEXT b, h, c, d jr nc, BDE_times_CHL_sqrdiff_v3_nc3 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc3 pop af ccf push afBDE_times_CHL_sqrdiff_v3_nc3: inc c dec h SUBFIRST b, l, d, e jr nc, BDE_times_CHL_sqrdiff_v3_nc4 dec c jp nz, BDE_times_CHL_sqrdiff_v3_nc4 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc4 pop af ccf push afBDE_times_CHL_sqrdiff_v3_nc4: dec h SUBFIRST e, c, d, e pop hl jr nc, BDE_times_CHL_sqrdiff_v3_nc5 dec c jp nz, BDE_times_CHL_sqrdiff_v3_nc5 dec b jp nz, BDE_times_CHL_sqrdiff_v3_nc5 inc lBDE_times_CHL_sqrdiff_v3_nc5: dec b dec c rr l rr b rr c rr d rr e ld a, ixl ld l, a ld a, ixh rra rr l retBC_DE_HL_times_10101: push bc ld a, h ex af, af' sub a ld c, a ld b, l add hl, bc adc a, a ld b, e add hl, bc adc a, c ; AHL = [ L+H+E L ] pop bc push hl push bc ld c, a ld b, 0 ex af, af' ld h, a add hl, bc ; no way this can carry (initial HL is a square) ld c, a ld b, e sub a add hl, bc adc a, a ; AHL(SP+2) = [ H+E L+H L+H+E L ] add hl, de adc a, 0 ; AHL(SP+2) = [ H+E+D L+H+E L+H+E L ] pop bc add hl, bc adc a, 0 ; AHL(SP) = [ H+E+D+B L+H+E+C L+H+E L ] ld e, d ld d, c add hl, de adc a, b jr nc, BC_DE_HL_times_10101_nc1 inc b ; BAHL(SP) = [ B B H+E+D+C+B L+H+E+D+C L+H+E L ]BC_DE_HL_times_10101_nc1: add a, e jr nc, BC_DE_HL_times_10101_nc2 inc b ; BAHL(SP) = [ B D+B H+E+D+C+B L+H+E+D+C L+H+E L ]BC_DE_HL_times_10101_nc2: pop de add a, c ld c, a ret nc inc b ; BCHLDE = [ B D+C+B H+E+D+C+B L+H+E+D+C L+H+E L ] ret
I do have a 24-bit floating-point multiplication routine saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)Code: [Select] ; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hlNext: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hlNext: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh
or a ;to make sure the c flag is reset. Not always necessary if you know the c flag will be resetsbc hl,bc ;you can do sbc hl,de also.
;Inputs:; HLBC is one of the 32-bit inputs; DE points to the other 32-bit input in RAM;Outputs:; HLBC is the 32-bit result; DE is incremented 3 times; A=H; c flag is set if there is an overflow ld a,(de) \ inc de add a,c \ ld c,a ld a,(de) \ inc de adc a,b \ ld b,a ld a,(de) \ inc de adc a,l \ ld l,a ld a,(de) adc a,h \ ld h,a ret
Quote from: jacobly on December 07, 2011, 11:05:18 pmI do have a 24-bit floating-point multiplication routine saved 2 bytes, 1149 cycles saved by using iy too (and in a way compatible with TIOS, imagine that)Code: [Select] ; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) rl c jr nc,Next inc hlNext: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixhJacobly, are you sure this works because I've been going through the code and it seems to me like the second 'rl c' should instead be 'add carry flag to c'. 2 times 'rl c' per loop seems wrong to me. Could you explain please? Because I've tried it as well in wabbitemu, taking the different input in account, and it is still not doing the right thing.
; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) jr nc,Next inc c jr nz,Next inc hlNext: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh
That's strange. My test program must not have been working right, because when I went back and changed it a bit, it suddenly started telling me that the second routine doesn't work. Anyway, my new test program seems to agree with this change. Code: [Select] ; hldebc = hlc * bde ld (iy+asm_Flag1),b xor a ld ix,0 ld b,24Loop: add ix,ix rla rl c adc hl,hl jr nc,Next add ix,de adc a,(iy+asm_Flag1) jr nc,Next inc c jr nz,Next inc hlNext: djnz Loop ld e,a ld d,c push ix ; ld c,ixl pop bc ; ld b,ixh