I’ve settled the direction for the next step in my master’s project this summer: I will be using the LLVM Compiler Infrastructure as the backbone of my LARs compiler.
The decision to work with an existing compiler rather than going off and writing tools from scratch carries some pretty significant advantages. The biggest is that some of the dark corners of the C specification make writing a complete, useful C fronted a very, very daunting task, so in order to not be compiling a “toy” language (or cheating with CIL or something) it is nearly the only choice(C, or C-like, is the obvious and preferred choice for input language, but pretty much all languages have similar concerns). Using an existing compiler also saves writing a whole bunch of ancillary code: in addition to the fronted, features for manipulating DAGs and performing optimizations and such are all there to be used and modified. Unfortunately, using LLVM also binds me to some design decisions made by other LLVM developers, and potentially exposes me to upstream weirdness. Thus far, I have found no serious cases of either, but suspect later in the process some interesting thorns will appear in my side as I more fully understand LLVM’s innards.
LLVM is a compiler infrastructure, rather than merely a complier because of it’s modular design. This modular design is also what makes it most attractive (among existing free, open source compilers) for my purposes, for a huge variety of reasons. The three big ones are:
First, the modular codebase helps with accessability. In many traditional full-scale compilers, the learning curve is nearly unsurmountable. In particular, the dominant free open source compiler suite, GCC, has a learning period measured in months or years before one can make substantial modifications, and requires mathematical concepts like the delta function to accurately express the learning curve.
Secondly, modularity allows me to, in a relatively straightforward way, drop in a new back end that emits code suitable for (but not complete, it’s going to take one HELL of a fancy assembler to be useful) for the proposed LARs design.
Third, the modularity extends unusually high into the structure of LLVM, which allows me to simply turn off, replace, or modify optimizations and features which are inappropriate for an architecture with LARs’ peculiar features.
My start on applying the (fairly thorough) manual for porting LLVM to a new architecture has already shaken out some new ambiguities, concerns, and omissions (some intentional) in the LARs design. This has lead to several sessions on one of the more exciting (in my twisted mind) parts of working with compilers and architectures: making and studying high-level decisions that affect both the hardware and software in a system, in potentially complex ways. Onward to more exciting adventures in computing and academia!
Those who do not understand Unix are condemned to reinvent it, poorly.— Henry Spencer
Unless otherwise noted, this work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License.