Post-SC, I sat down to do some deeper reading on the HSA (Heterogeneous System Architecture) stuff. This is AMD/ARM (and many friends)’s plan for the future, and it is pretty fucking exciting (in an obscure technical sort of way).
The best starting point I found is this year old whitepaper [PDF warning]. They’re using slightly odd terminology, the important bits are LCU = Latency Compute Unit = Conventional MIMD CPU Core, TCU = Throughput Compute Unit = Accelerator, typically SIMD-engine-ish like a GPU, HSAIL = HSA Intermediate Language = IR that can be compiled at install/run time to accelerator’s ISAs. The hardware-side implementation details are nowhere to be found, but there are a lot of seriously exciting model-affecting things detailed on the software side. The general model, with things broken into grid, work group, work item, wavefront is FAR more sane than most of the parallel schemes (I’m thinking specifically of the awful CUDA nomenclature). Internally, the exciting stuff includes requiring a limited sort of preemption on the accelerators, a relaxed consistency model across memory shared over a whole system (nice thread-like shared memory), an intermediate low level language/VM for portability, and assurances about barrier capability in the TCU. The actual objects are basically FAT ELFs with a complete copy of the program for the LCU, plus the HSAIL representation for the parts that can be shipped to TCUs. I’m pleased that there seems to be a clever run-time that does a bunch of platform enumeration and controls where parts run in a rule-automated-but-overrideable way.
I had some folks at SC tell me they’d try to get me a more implementation-focused whitepaper on the hardware side at AMD but they weren’t sure if/when details would be clear for distribution. On the software side, the details are in a published draft of the ISA/Model/Compiler Writer’s Guide that I browsed around in a bit and found very enlightening. The reference tool-chain seems to be mostly built on LLVM and OpenCL.
I have some other SC-related thoughts to share, but I want to get them a little bit more refined (and decide which are for public consumption) before I post.