Source: Hacker News
Article note: That is a really cool bit of forensics.
AMD K6-2s did much more sophisticated uop decomposition and scheduling for complicated instructions (specifically LOOP) and worse with handling programmer/compiler generated simple instruction sequence, while contemporary Intel parts were the opposite; their errata suggested using the simpler instructions like building your loop on JCXZ instead of using LOOP.
So code that did limited precision timer comparisons on either end of fixed-length LOOPs failed with a divide by 0 on AMD parts at lower clocks than on Intel parts.
Comments