I learned some really interesting things at SC this year, and now that I’ve had a day to process, I want to share. Many of these observations come from first or second hand conversations, or justifiable interpretations of press releases, so I don’t promise they are correct, but they are plausible, explanatory, and interesting. I apologize for the 1,000 word wall of text, but there is a lot of good stuff.
- This is the big one: I’m pretty sure I understand the current long term architecture plan being pursued by Intel, AMD, and Nvidia. This plan signals the end of the current style of monolithic symmetric processor cores.
They are all apparently pursuing designs with a smallN
of large integer units, coupled toM >> N
SIMD engines.- Nvidia’s “Project Denver” is a successor/big sibling to Tegra design, and appears to be the beginning of a line with 2-8 64-bit (probably) ARM cores tightly integrated with a big honking GPU-like SIMD structure for FP. The stale press release about this stuff is kind of nauseating to read, but it looks like they’re betting the farm on that design.
- Intel’s HPC efforts are going to be based on a lot of MIC (Many Integrated Cores, successor to the Larabee stuff) parts coupled with a few big cores like the current Xeons. The MIC chips are basically large numbers of super-Atoms: tiny, simple, dumb integer units attached to big SWAR (SIMD Within a Register) units focused on SSE/AVX performance. This is less speculative than most observations, they made a pretty good press push (This for example) on this idea.
The ring interconnects and higher per-“thread” hardware complexity are probably not a good idea in the long run (IMHO), but having an integer unit for every big SWAR engine will be a major advantage in terms of programming environment and code generation. I suspect the more cautious approach is because Intel doesn’t want/can’t afford another Itanic, where the tools couldn’t generate good code for the programming model on their intended high-end part. - AMD’s two current products are stepping stones to a design similar to Nvida’s – Bulldozer is a design with some ridiculously powerful x86-64 integer units decoupled from a smaller number of shared FPUs. The APU (I haven’t heard the “Fusion” name in a while) designs are CPUs tightly coupled to GPU structures. The successor parts will be a hybrid of the two – a few big, bulldozer style integer units, with a large number wide next-gen GPU SIMD structures coupled to them.
I think this is generally a good design direction, particularly with current directions in computing in mind, but it is going to make the compiler/concurrent programming world exciting for a while.
- AMD appears to be gearing up to abandon a fifth generation of GPGPU products. CTM, CAL, Brook+, OpenCL on 4000 series cards have all been deprecated while still shipping, and indications are that OpenCL (and general driver) support for the current architecture (4-wide VLIW SIMDs, like in the 5- and 6- series) has been relegated to second-class citizen status, while they work on a next generation architecture. The rumor is the next gen parts will be 4 independent banks of SIMD engines instead of 4-wide VLIW SIMD engines, which should be both both nicer to program and generate code for and more similar to Nvidia.
- Nvidia is going to open source their CUDA environment. One of the primary objections to CUDA in a lot of circles is reluctance to use a proprietary single-vendor programming environment (people who have been in super/scientific computing for long have all been burnt on that in the past), and the Integer+SIMD model is going to require that not be an issue. This is assembled from information from several places, including PGI, Nvidia, and various scientific compute facilities, much of it second hand or further, but it would make sense.
- I still don’t exactly know what went down at Infiscale, but the impression that the Perceus community was abandoned by the company, the developers fled, and it was a bad scene seems to be correct. No one I know that was there seems to be talking, but they’re all on their way to other interesting things, especially Greg Kurtzer’s Warewulf3 project at LBL.
- The dedicated high performance compute nodes in Amazon’s EC2 cloud are actually connected as a few large partitionable clusters, users just can’t (nominally, don’t need to) see and instrument the topology like they could with a normal cluster. This is from interpreting press releases, because the people manning Amazon’s booth really didn’t want to chat (and, in fact, were kind of dicks when we tried). This explains how they’ve been getting performance out of a loosely coupled cloud — which is to say they aren’t, they just have a huge cluster attached to their cloud that shares the interface.
- The current hard drive production problems have given SSDs the opportunity they need to become first class citizens. Talking to OEMs, the wholesale cost per capacity on HDDs almost tripled, and the supply lines aren’t all that stable, so everyone is scrambling to make things work with mostly SSDs. I saw a lot of interesting new form factors for SSDs, and several flavors flash or battery backed “nonvolatile” DRAM floating about as well, so the nature of storing data-sets is changing.
- I saw motherboards with 32 DIMM slots (mostly AMD Interlagos based) on the floor. I saw 32GB DIMMs on the floor. I saw some shared-memory systems with multiple Terabytes of RAM in them. The standard for high memory machines has roughly quadrupled in the last year or two.
- The number of women (not booth babes, real technical people, especially younger ones) and educators on the show floor this year was way higher than in the past. This is very good for the field.
I think that covers most of the really good stuff coming off the floor this year, although I am still processing and may come up with some other insights when I’ve had more sleep and discussion.
Also, Pictures! WOO! (Still sorting and uploading the last batch at time of posting).