Author Topic: [Resolved] 16-bit Sqrt trouble  (Read 2364 times)

0 Members and 1 Guest are viewing this topic.

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
[Resolved] 16-bit Sqrt trouble
« on: October 22, 2013, 12:16:29 pm »
I was very excited to have a fairly fast 16-bit fast square root routine (300 cycles faster than my previous best), but after testing the routine, it appears to have problems for some large inputs. The first issue starts after the 14-bit range, so I imagine it is overflow. Looking at my routine, I foresaw the overflow issue in the last iteration so I moved it outside of the loop (while optimising it). I will be trying to fix this, but I thought there might be a chance somebody else spies the problem first.
Code: [Select]
sqrtHL:
;input is HL
;output is A
;734 t-states worst case
;39 bytes
    ld bc,700h
    ld d,c
    ld e,c
sqrt16loop:
add hl,hl \ rl e
add hl,hl \ rl e
rl c
ld a,c
rla
sub e
jr nc,$+5
inc c
cpl
ld e,a
djnz sqrt16loop

ld l,c
   ld a,l
add hl,hl \ rl e \ rl d
add hl,hl \ rl e \ rl d
sbc hl,de
rla
ret

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: 16-bit Sqrt trouble
« Reply #1 on: October 22, 2013, 03:12:45 pm »
I think I figured it out. After looking back through my pseudo-code for the algorithm, I noticed that for n bits, I needed a variable with n bits, n+1 bits, and n+2 bits. I fixed it by taking the last two iterations as special cases. It bloats the code a little more, but it works:
Code: [Select]
SqrtHL5:
;input: HL
;Output: A
;63 bytes
;749 t-states worst case
ld bc,600h
ld d,c
ld e,c
sqrt16loop:
add hl,hl \ rl e
add hl,hl \ rl e
rl c
ld a,c
rla
sub e
jr nc,$+5
inc c
cpl
ld e,a
djnz sqrt16loop

sla c \ ld a,c \ add a,a
rl h \ rl e
rl h \ rl e
jr nc,$+6
sub e \ jp $+6
sub e
jr nc,$+5
inc c
cpl
ld e,a

ld l,c
    ld a,l
add hl,hl \ rl e \ rl d
add hl,hl \ rl e \ rl d
sbc hl,de
rla
ret
Also, <750 t-states as promised :D (assuming I computed it correctly).

Offline Xeda112358

  • they/them
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4704
  • Rating: +719/-6
  • Calc-u-lator, do doo doo do do do.
    • View Profile
Re: [Resolved] 16-bit Sqrt trouble
« Reply #2 on: October 24, 2013, 10:38:02 am »
For 73 bytes (10 bytes more), you can save 110 t-states on the worst case. It involves some unrolling:
Code: [Select]
SqrtHL:
;input: HL
;Output: A
;73 bytes
;639 t-states worst case
xor a
ld b,4
ld e,l
ld l,h
ld h,a
sqrt16loop:
add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c
ld a,c
djnz sqrt16loop
ld l,e
ld b,2
sqrt16loop2:
add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c
ld a,c
djnz sqrt16loop2

add a,a \ ld c,a
add hl,hl
add hl,hl
jr nc,$+6
sub h \ jp $+6
sub h
jr nc,$+6
inc c \ inc c
cpl
ld h,a


;b=0
;c is the result
;l has two more bits to rotate into h
ld a,l
ld l,h
ld h,b
add a,a \ adc hl,hl
add a,a \ adc hl,hl
ld a,c
sla c \ rl b
sbc hl,bc
ret c
inc a
ret
But really, if you are going to unroll that far, you should just unroll the whole thing and go up to 108 bytes and down to 543 t-states worst case:
Code: [Select]
SqrtHL:
;input: HL
;Output: A
;108 bytes
;543 t-states worst case
;Average is about 509 t-states
xor a
ld b,a

ld e,l
ld l,h
ld h,a

add hl,hl
add hl,hl
cp h
jr nc,$+5
dec h
ld a,4

add hl,hl
add hl,hl
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c

ld a,c
add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c

ld a,c
add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c

ld a,c
ld l,e

add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c

ld a,c
add hl,hl
add hl,hl
add a,a
ld c,a
sub h
jr nc,$+6
cpl
ld h,a
inc c
inc c

ld a,c
add a,a \ ld c,a
add hl,hl
add hl,hl
jr nc,$+6
sub h \ jp $+6
sub h
jr nc,$+6
inc c \ inc c
cpl
ld h,a


ld a,l
ld l,h
add a,a
ld h,a
adc hl,hl
adc hl,hl
ld a,c
sll c \ rl b
sbc hl,bc
ret c
inc a
ret