Source: Hacker News
Article note: Interesting. Intel and AMD have each pumped at least a year of funding into this thing, and both backed out after their initial commitment, but there is now a vaguely-complete open source CUDA runtime for other targets.
The HN thread seems to have a hard time understanding that Nvidia has
(1) been co-evolving CUDA and their chips, so executing CUDA code on non-Nvidia-architecture parts will always be at a disadvantage.
(2) Nvidia embedded a bunch of blackbox runtime behavior into CUDA that is necessary for the execution semantics but not exposed in the APIs, because it was convenient and/or to throw off competing implementations.
(3) Nvidia has shown a great deal of willingness to spend resources preventing competing implementations (like when they bought PGI from ST)
(4) Nvidia has been treadmilling the CUDA APIs for years, to iteratively improve user experience and/or to throw off competing implementations. No reason to think they'll stop.
(5) The way _most_ of the user base works, CUDA looks more like a high-level assembly, not the language they work in, so there isn't actually that much CUDA code in the world. For the big money example, most of the AI boom has been built on higher level abstractions (...and reusing other people's code), so you only need to write a eg. torch backend for a different target, not for every user to rewrite a bunch of their own code. Rolling with that example, there are _already_ Intel/oneAPI and AMD/ROCm backends in torch that Intel and AMD should prefer to perpetually playing catch-up with CUDA.
Comments