I have a likely masters project topic: LARs, or, more specifically, a complier which targets the LAR model.
To attempt to explain to people who aren’t computer engineers:
In “normal” modern computer designs, memory is broken into a number of levels.
At the highest level, there is a relatively huge main memory (this is RAM in a modern system). In the early days of computers, main memory was at least as fast as the processor. Processors have been becoming faster at a much higher rate than memory, and so in a modern computer main memory is about 4 orders of magnitude slower than the processor.
Next is a system of caches, which are smaller and faster to access than main memory, but larger and slower than registers. A cache attempts to hold things from main memory which are likely to be needed soon, in order to help hide how slow memory is. Unfortunately, the algorithms used to determine what is in the cache are, necessarily, very, very stupid. So while caches are overall helpful to performance, it is because they help a lot about 10% of the time, and only hurt a little bit the other 90% of the time. Caches are divided into cache lines, which contain several items of data, information about where the data originated, weather the data has been changed, and some demarcation for the relative priority of that line.
At the smallest and fastest level, there are a small collection of fast one-item storage areas called registers in which active data is placed. Registers are generally accessible in a single CPU clock cycle.
In theory, by rethinking the design of the register, one can eliminate caches while still successfully hiding memory latency, and collecting a variety of interesting fringe benefits along the way. This is what LARs, and their predecessor CRegs attempt to do. Fundamentally, a LAR looks something like a register; the processor addresses it directly, it is very fast, and it is relatively small. A LAR also looks like a cache line, in that it holds several data elements, a field to mark if it has been changed (dirty bit), a source (where the contents came from in main memory), and some meta data to allow intelligent handling of the contents. There are a variety of awesome consequences to this design, including cool tricks with intelligent parallelism, and huge, huge wins in memory bandwidth. My project will be making a complier which can compile normal code (probably C) against the LARs model in an intelligent way. Two other students are already underway working on a hardware (FPGA) implementation of the concept architecture.
For those few people (the special weirdos)who made it to the end of this post without glazing, the first link has papers with (some) detail for you to look at. For the rest of you, this has been an episode of “talking to grad students about their research,” which can be safely ignored. It will undoubtedly be followed by more like it with more frightening technical detail.