Source: Hacker News
Article note: Just paraphrasing my comment in the HN discussion on my own medium:
The 432 was the first of Intel's many expensive lessons about the problems with extremely complicated ISAs dependent on even more sophisticated compilers making good static decisions for performance. Then they did it again with the i860. Then they did it again with Itanium.
Some reasonably substantiated opinions:
1. Highly sophisticated large-scale static analysis keeps getting beaten by relatively stupid tricks built into overgrown instruction decoders, working on relatively narrow windows of instructions.
2. The primary reason for (1) is that performance is now almost completely dominated by memory behavior, and making good static predictions about the dynamic behavior fancy memory systems in the face of multitasking, DRAM refresh cycles, multiple independent devices competing for the memory bus, layers of caches, timing variations, etc. is essentially impossible.
3. You can give up on a bunch of your dynamic tricks and build much simpler more predictable systems that can be statically optimized effectively. You could probably find an good local maxima in that style. The dynamic tricks are, however, unreasonably effective for performance, and have the advantage that they let you have good performance with the same binaries on multiple different implementations of an ISA. That's not insurmountable (eg. the AOT compilation for ART objects on Android), but the ecosystem isn't fully set up to support that kind of thing.
Comments