SC16 Notes

Posting pictures and thoughts from SC16, because someone else might learn from them.

I pushed my crappy cellphone pictures from the show into a google photos album, some of them even have comments about why they’re interesting attached. I’ll probably add a few pictures of the swag haul when I get to sorting through it.

Unlike previous years, we actually coordinated with UK’s Center for Computational Sciences folks, they added some displays and spent some time in our booth, and and are (supposedly) taking the lead on UK’s SC presence next year. They got to show off a poster and demo of their campus DMZ trick to the NSF program director who funded that work, and the new director got the SC experience of people and sights, so I think they’re convinced of the value in doing so.

The valuable things I learned are enumerated below:
1. A few years ago, there were concerns that system power density was becoming a limiting factor; the machines people wanted to build were larger than the allowable cable lengths for the preferred interconnects in order to accommodate the volume of cooling equipment required to make it run. Over the last few years, several players decided they could cheat the problem by switching to water cooling. This year, we’re seeing an awful lot of systems in the 0.25-0.33MW/Rack range, and predictions that the latest 3M phase change secret sauce will get them up to ~0.5MW/Rack once the power density of the parts gets there. The trick on that is fun, Novec 7100, which is apparently methoxy-nonafluorobutane – C4F9OCH3 – boils at about 61°C such that the phase change very quickly pulls heat away from hotspots, and can then be re-condensed in a big fucking water-loop radiator at top of rack. The rest of the scale issue is being handled by obscenely expensive electro-optical cables and such. Less clever, more throwing cash at the problem.

2. For next generation large systems, there are, to borrow a phrase form the person who initially gave me the heads up on this, “two swim-lanes.” Power8+Nvidia, connected inside the nodes with NVLink to get adequate memory bandwidth, and Intel Knights Landing manycore boxes. Either way, almost all the big stuff is wired together with Infiniband.

3. The annual “Hank was right” award (which, following tradition, comes with neither award nor recognition) was issued by Mellanox, who are now into “In-Network Computing” which looks an awful lot like the Aggregate Function Network work from the mid 90s. Basically, they’re doing MPI (&etc.) sync operations in the network adapters and switches to avoid both interrupts and latency.

4. To me, the most fascinating thing on the floor Emu Technology which is essentially the lone radically-different architecture being developed right now. The whole thing is basically a refinement of in-memory computing, built on the premise that it’s usually cheaper to move the thread state to the data being worked on than the data being worked on to the core. It almost looks like a late-career folly by a bunch of badass old architecture folk, but they’re actually getting things done, in no small part because their machine is specifically suited to perform vast numbers of small operations on even vaster bodies of sparse unstructured data, and they are obviously mostly bankrolled by “Our Benefactors” – anonymous well-heeled entities interested in radical ways to do exactly that are easily deduced but better to not name.

I want to be thinking about some of their problems, they have some really interesting toolchain work ahead, especially with regard to visualizing >16K-threads wandering around in memory in human-accessible ways to look for hotspots and other debugging tasks. Closing the memory layout feedback loop on these is the hard mode of a problem that’s hard but interesting on normal systems. This was even better because I later got to talk with Burton Smith for about half an hour about our respective musings on what they’re up to, mostly questions about their memory model, and talking to Burton for half an hour is more intellectually stimulating than about half the college courses I’ve taken.

5. Task-centric, graph-oriented programming models are getting their due. An awful lot of OCCA and Kokkos in the national labs, because they suits both swim-lanes, and, being based on fugly C++ template magic, are relatively vendor-neutral and straightforward for adapting existing codes. Folks are also taking Tensorflow seriously.

6. My expectations for Nvidia’s ARM64 ambitions are now essentially nill. I’m pretty sure I’m not supposed to share how or why I think this, but shit sounds doomed-via-mismanagement on that front.

7. Achronix has their own FPGAs. Part of the gimmick is that they have like 6 hardware DDR and Ethernet controllers wrapped in their fabric, but that’s not all that unusual. The interesting bit is that they might be willing to share/document enough of their cell structure that some of the permanently-back-burner compiling-HLLs-to-hardware tricks could happen.

The usual SC features also apply. An overview of the state of the field. The best serverporn money can buy. Adult trick-or-treating, vis-a-vis the various vendor swag. And we got to see all the regulars; Greg Kurtzer [Warewulf, CentOS, currently on Singularity], Burton Smith [Terra/Cray, MSR, etc. currently fascinated by languages for quantum computing], Don Becker [Beowulf, Ethernet Drivers, Scyld, Nvidia, currently building fun automotive electronics], Doug Eadline [“the media” of the HPC world] former group members like Tim and Randy, etc. Those visits are half the joy of continuing to go to SC.

This entry was posted in Computers, General. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *