TLDR; Don’t ever use __lzcnt without a corresponding __cpuid check.
I recently ran into a problem with a port of some g++ code to MSVC (2013). It was doing some bit-twiddling and needed an operator to count the leading zeros. It turns out that MSVC provides an intrinsic just for this purpose, __lzcnt.
Everything seemed to work, but a bug was reported and we traced it to this statement. The funny thing was, a simple test case (printing the leading zeros for a few different integers) gave different results on different machines and, for the value of 0, generated different answers each time.
We eventually worked out the root cause. The ‘lzcnt’ instruction is only provided by certain CPUs, and __lzcnt is just directly turned into this instruction regardless of whether it’s available or not. The funny (not so funny) thing is that instead of getting an ‘illegal instruction’ result when you run it, Intel (in their infinite wisdom) decided to reuse or repurpose existing opcodes so that CPUs without ‘lzcnt’ instead did a ‘bsr’ (bit scan reverse). This was why (a) the results were different/wrong, and (b) why a value of 0 gave gibberish (the docs for ‘bsr’ say the results are undefined in that case).
…What happened is that Intel used the invalid sequence rep bsr to encode the new lzcnt instruction. Using a rep prefix on bsr (and many other instructions) was not a defined behavior, but all previous Intel CPUs just ignore redundant rep prefixes (indeed, they are allowed in some places where they have no effect, e.g., to make longer nop instructions).
So if you happen to execute lzcnt on a CPU that doesn’t support it, it will execute as bsr. Of course, this fallback is not exactly intentional, and it gives the wrong result…
Careful reading of the __lzcnt docs does say this in the Remarks: “If you run code that uses this intrinsic on hardware that does not support the lzcnt instruction, the results are unpredictable.”. I think this could be made a bit more obvious – hence this blog post for future googlers.