{"id":2554,"date":"2017-06-06T16:06:58","date_gmt":"2017-06-06T15:06:58","guid":{"rendered":"https:\/\/nextmovesoftware.com\/blog\/?p=2554"},"modified":"2017-06-06T16:06:58","modified_gmt":"2017-06-06T15:06:58","slug":"the-perils-of-using-__lzcnt-with-msvc","status":"publish","type":"post","link":"https:\/\/nextmovesoftware.com\/blog\/2017\/06\/06\/the-perils-of-using-__lzcnt-with-msvc\/","title":{"rendered":"The perils of using __lzcnt with MSVC"},"content":{"rendered":"<p>TLDR; Don&#8217;t ever use __lzcnt without a corresponding __cpuid check.<\/p>\n<p>I recently ran into a problem with a port of some g++ code to MSVC (2013). It was doing some bit-twiddling and needed an operator to count the leading zeros. It turns out that MSVC provides an intrinsic just for this purpose, __lzcnt.<\/p>\n<p>Everything seemed to work, but a bug was reported and we traced it to this statement. The funny thing was, a simple test case (printing the leading zeros for a few different integers) gave different results on different machines and, for the value of 0, generated different answers each time.<\/p>\n<p>We eventually worked out the root cause. The &#8216;lzcnt&#8217; instruction is only provided by certain CPUs, and __lzcnt is just directly turned into this instruction regardless of whether it&#8217;s available or not. The funny (not so funny) thing is that instead of getting an &#8216;illegal instruction&#8217; result when you run it, Intel (in their infinite wisdom) decided to reuse or repurpose existing opcodes so that CPUs without &#8216;lzcnt&#8217; instead did a &#8216;bsr&#8217; (bit scan reverse). This was why (a) the results were different\/wrong, and (b) why a value of 0 gave gibberish (the docs for &#8216;bsr&#8217; say the results are undefined in that case).<\/p>\n<p>For background, see this <a href=\"https:\/\/stackoverflow.com\/a\/43443701\">StackOverflow answer<\/a> from <a href=\"https:\/\/stackoverflow.com\/users\/149138\/beeonrope\">BeeOnRope<\/a>:<\/p>\n<blockquote><p>\n&#8230;What happened is that Intel used the invalid sequence rep bsr to encode the new lzcnt instruction. Using a rep prefix on bsr (and many other instructions) was not a defined behavior, but all previous Intel CPUs just ignore redundant rep prefixes (indeed, they are allowed in some places where they have no effect, e.g., to make longer nop instructions).<\/p>\n<p>So if you happen to execute lzcnt on a CPU that doesn&#8217;t support it, it will execute as bsr. Of course, this fallback is not exactly intentional, and it gives the wrong result&#8230;\n<\/p><\/blockquote>\n<p>Careful reading of the <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/bb384809(v=vs.120).aspx\">__lzcnt docs<\/a> does say this in the Remarks: &#8220;If you run code that uses this intrinsic on hardware that does not support the lzcnt instruction, the results are unpredictable.&#8221;. I think this could be made a bit more obvious &#8211; hence this blog post for future googlers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>TLDR; Don&#8217;t ever use __lzcnt without a corresponding __cpuid check. I recently ran into a problem with a port of some g++ code to MSVC (2013). It was doing some bit-twiddling and needed an operator to count the leading zeros. It turns out that MSVC provides an intrinsic just for this purpose, __lzcnt. Everything seemed &hellip; <a href=\"https:\/\/nextmovesoftware.com\/blog\/2017\/06\/06\/the-perils-of-using-__lzcnt-with-msvc\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The perils of using __lzcnt with MSVC<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2554"}],"collection":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=2554"}],"version-history":[{"count":5,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2554\/revisions"}],"predecessor-version":[{"id":2559,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/posts\/2554\/revisions\/2559"}],"wp:attachment":[{"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=2554"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=2554"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nextmovesoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=2554"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}