Tag Archives: Supercomputing

SC13 Retrospective

Posting up my notes from SC13 is another thing I didn’t get to during the end of the semester. Remedying now.

The main takeaway sequence from conversations on the floor is as such:

  1. The era of single-core performance gains is already over.
  2. Furthermore, the era of usable single-die performance for MIMD machines is coming to an end.
  3. Therefore, big machines are going to be getting physically bigger… to the point where connection lengths are a problem (everything is Infiniband, and Infiniband doesn’t tolerate long runs well)
  4. There is a LOT of cooling effort to make the necessary density happen – central large fan systems, immersion cooling, closed-circuit water gear, etc.

The other really exciting thing that it seems AMD is going to make it, and more. Their lean period finished when the payoff on the XBone/PS4 came in, and they have a VERY good plan for the next >2 years. It works with the premise above about single-core/die MIMD performance ending, and points in the HSA direction – this is the crazy parts with MMUs so a CPU and GPU can share memory without skew penalty and such. ARM and partners are also generally pointed that way, and have been for some time, though apparently AMD isn’t getting out of the x86 game, but it does look like they are getting out of the fat core game.
Continue reading

Posted in Computers, General, Objects, School | Tagged , , | Leave a comment

SC13

I will be at SC’13 November 16-21 with the aggregate.org/University of Kentucky research exhibit again this year in booth 629. Media and impressions should appear somewhere in my ‘net presence during and after the conference, it is always a good show.
Edit:Pushing photos from the show floor into this album.

Posted in Announcements, Computers, Entertainment, General, School | Tagged , , , | 1 Comment

SC12 Impressions

I’ve got my pictures from the event up in an album in google’s cloud.

Here are the big, cool things I learned on the floor or at the various evening events:

  • Xeon Phi. Xeon Phi everywhere. Intel may have backed off on Larrabee, but the MIC descendants are proliferating quickly, and appear to actually be in use. They really are interesting parts – 60 node Linux x86 SMP box attached to a host system over PCI Express via a network-like interface. Somewhere between a tiny desk-side cluster and a GPU with a programming model you can actually use.
  • The population was much less male dominated than is typical for computing events. This is always a good thing.
  • The average age of attendees also seemed to be down by the better part of a decade.
  • The national labs losing their booths to the government-wide travel restrictions (apparently some folks went junketing in Vegas and it was that bad) changed the feel of the floor. Fewer, but longer and deeper conversations. More open layout, because many of the usual big constructed booths belong to the national labs. Users from the national labs hanging out at vendor’s booths. Not altogether a bad thing, but it was quite different.
  • ARM64 (aka aarch64, aka ARMv8). It is happening. It is odd (64KB pages, etc.). Large companies are being bet on it. We’re talking many billions of dollars, biggest bet since Itanium kind of big. The priority seems to be avoiding the Itanium mistakes, making sure the designs arrive promptly, and making sure software support is ready. Dell is talking quietly, Calxeda is gunning for it, Nvidia was showing (but only quietly) plans, AMD is being pointed to as a likely leader, and ARM was is sitting in their little 10×10 booth along one wall of the exhibit floor looking very pleased with themselves.
  • AMD is dying. The untimely (and vigorously denied) rumor that they hired J.P. Morgan to begin plans to sell part or all of themselves made it look even worse, but they had almost no presence on the floor, and scheduled a tiny booth next year.
  • We talked to a number of networking vendors making interesting things (free-air optical switching, multi-port Ethernet NICs, etc.). Infiniband is so good and so cheap (in a relative sense) for cluster applications right now that everyone else is hunting for an edge. This is a good thing for researchers.

There will be at least one more SC12 post later, when my cube of schwag arrives. The T-shirt harvest was great this year…

Posted in Computers, DIY, General, School | Tagged , , , | Leave a comment

SC12

I will be at SC12 November 10-16, with the Aggregate.org/University of Kentucky exhibit in booth 631.

I will be posting pictures and impressions through at least one of my online presence mechanisms . I fully expect it to be weird this year with a bunch of the national labs pulled out due to travel restrictions, but it should be interesting.

Posted in Announcements, Computers, General, School | Tagged , , , | Leave a comment

SC’11 Lessons

I learned some really interesting things at SC this year, and now that I’ve had a day to process, I want to share. Many of these observations come from first or second hand conversations, or justifiable interpretations of press releases, so I don’t promise they are correct, but they are plausible, explanatory, and interesting. I apologize for the 1,000 word wall of text, but there is a lot of good stuff.

  • This is the big one: I’m pretty sure I understand the current long term architecture plan being pursued by Intel, AMD, and Nvidia. This plan signals the end of the current style of monolithic symmetric processor cores.
    They are all apparently pursuing designs with a small N of large integer units, coupled to M >> N SIMD engines.

    • Nvidia’s “Project Denver” is a successor/big sibling to Tegra design, and appears to be the beginning of a line with 2-8 64-bit (probably) ARM cores tightly integrated with a big honking GPU-like SIMD structure for FP. The stale press release about this stuff is kind of nauseating to read, but it looks like they’re betting the farm on that design.
    • Intel’s HPC efforts are going to be based on a lot of MIC (Many Integrated Cores, successor to the Larabee stuff) parts coupled with a few big cores like the current Xeons. The MIC chips are basically large numbers of super-Atoms: tiny, simple, dumb integer units attached to big SWAR (SIMD Within a Register) units focused on SSE/AVX performance. This is less speculative than most observations, they made a pretty good press push (This for example) on this idea.
      The ring interconnects and higher per-“thread” hardware complexity are probably not a good idea in the long run (IMHO), but having an integer unit for every big SWAR engine will be a major advantage in terms of programming environment and code generation. I suspect the more cautious approach is because Intel doesn’t want/can’t afford another Itanic, where the tools couldn’t generate good code for the programming model on their intended high-end part.
    • AMD’s two current products are stepping stones to a design similar to Nvida’s – Bulldozer is a design with some ridiculously powerful x86-64 integer units decoupled from a smaller number of shared FPUs. The APU (I haven’t heard the “Fusion” name in a while) designs are CPUs tightly coupled to GPU structures. The successor parts will be a hybrid of the two – a few big, bulldozer style integer units, with a large number wide next-gen GPU SIMD structures coupled to them.

    I think this is generally a good design direction, particularly with current directions in computing in mind, but it is going to make the compiler/concurrent programming world exciting for a while.

  • AMD appears to be gearing up to abandon a fifth generation of GPGPU products. CTM, CAL, Brook+, OpenCL on 4000 series cards have all been deprecated while still shipping, and indications are that OpenCL (and general driver) support for the current architecture (4-wide VLIW SIMDs, like in the 5- and 6- series) has been relegated to second-class citizen status, while they work on a next generation architecture. The rumor is the next gen parts will be 4 independent banks of SIMD engines instead of 4-wide VLIW SIMD engines, which should be both both nicer to program and generate code for and more similar to Nvidia.
  • Nvidia is going to open source their CUDA environment. One of the primary objections to CUDA in a lot of circles is reluctance to use a proprietary single-vendor programming environment (people who have been in super/scientific computing for long have all been burnt on that in the past), and the Integer+SIMD model is going to require that not be an issue. This is assembled from information from several places, including PGI, Nvidia, and various scientific compute facilities, much of it second hand or further, but it would make sense.
  • I still don’t exactly know what went down at Infiscale, but the impression that the Perceus community was abandoned by the company, the developers fled, and it was a bad scene seems to be correct. No one I know that was there seems to be talking, but they’re all on their way to other interesting things, especially Greg Kurtzer’s Warewulf3 project at LBL.
  • The dedicated high performance compute nodes in Amazon’s EC2 cloud are actually connected as a few large partitionable clusters, users just can’t (nominally, don’t need to) see and instrument the topology like they could with a normal cluster. This is from interpreting press releases, because the people manning Amazon’s booth really didn’t want to chat (and, in fact, were kind of dicks when we tried). This explains how they’ve been getting performance out of a loosely coupled cloud — which is to say they aren’t, they just have a huge cluster attached to their cloud that shares the interface.
  • The current hard drive production problems have given SSDs the opportunity they need to become first class citizens. Talking to OEMs, the wholesale cost per capacity on HDDs almost tripled, and the supply lines aren’t all that stable, so everyone is scrambling to make things work with mostly SSDs. I saw a lot of interesting new form factors for SSDs, and several flavors flash or battery backed “nonvolatile” DRAM floating about as well, so the nature of storing data-sets is changing.
  • I saw motherboards with 32 DIMM slots (mostly AMD Interlagos based) on the floor. I saw 32GB DIMMs on the floor. I saw some shared-memory systems with multiple Terabytes of RAM in them. The standard for high memory machines has roughly quadrupled in the last year or two.
  • The number of women (not booth babes, real technical people, especially younger ones) and educators on the show floor this year was way higher than in the past. This is very good for the field.

I think that covers most of the really good stuff coming off the floor this year, although I am still processing and may come up with some other insights when I’ve had more sleep and discussion.
Also, Pictures! WOO! (Still sorting and uploading the last batch at time of posting).

Posted in Announcements, Computers, Electronics, General | Tagged , , , , | 1 Comment