Author Topic: [x86] Hiding the Else in a NOP (Read 3524 times)

harold · « **on:** August 14, 2012, 01:46:39 pm »

I'm trying to implement 0x8000000000000000 >> nlz(x) well.

What I came up with might be a bit unorthodox:

  mov r11d, 1
  bsr rcx, rax
  jz _iszero
  shl r11, cl
  .db 0F, 1F, 80 ; nop [rax+sdword] with the sdword being the next shl
_iszero:
  shl r11, 63  ; 49 D3 E3 3F so 4 bytes

Because BSR is retarded and returns something useless when the argument is zero, I have to handle that case with a branch. But this gets rid of the branch I'd otherwise use to skip the second shl.
An other way to do this is shl-ing by 63 in all cases (or it could be a 64bit mov) and then shr back in the nonzero case. That means xor-ing the result of bsr with 63 though - not a disaster, but more instructions.

Is there any reason not to do it this way? (besides "maintainability", I'm the only person who's ever going to read it anyway and I certainly know what this means)
Any unexpected slowdowns on some micro-architectures? Are trace caches OK with this?
Is the other way I described better?

Author Topic: [x86] Hiding the Else in a NOP (Read 3524 times)

harold

[x86] Hiding the Else in a NOP