I’ve spent the last several months discovering ways in which LLVM is excessive, ill-behaved, oversold, broken or simply lying in it’s documentation while working on my master’s project. In fact, that is almost ALL of what I accomplished in the past several months, and it was griding on me psychologically. Last weekend, I finally admitted to myself that, for a mixture of psychological and technological reasons, I wasn’t going to be able to finish the project using LLVM. I’ve always had a bail-out path set up to do a simpler full-custom compiler if something proved intractable, and Monday I talked to the advisor about putting that plan into action.
I’m now building a simple C-like language with PCCTS (old school 1.33MR33 emits-pure-C ANTLR and SOURCERER PCCTS, not modern ANTLR), hand-coding C for the remainder, and concentrating on the technically interesting matters. To that end, I’ve accomplished more conceptual work in the last week than the two preceding months, started laying out my new tools, and feel much better. The best “aha” moment I’ve had in months came when I sat down and worked out how the stack and calling conventions were going to work in my new system; LARs are beautiful and well-suited to real world problems in a whole host of ways I didn’t understand before.
Note that I don’t think LLVM is entirely bad; in conception, having a nice cleanly-interfaced compiler infrastructure with a library of frontends, backends, optimizer passes and other common components which snap together via clean interfaces, high level specifications, and a standardized intermediate representation is exactly how compiler tools should be structured. The problem is that LLVM has been drastically oversold, and overcomplicated. Every neat-looking interface has a mess of poorly-documented C++ that needs to be written to support it. Everything is set up with an emphasis on LLVM’s JIT/VM features, making it unnecessarily clumsy to use to produce pure compilers, or interface with external tools. Worst of all, the documentation is deceiving — much worse than incomplete, it appears complete, but is frequently deceptive or wrong. To pick one of the more egregious examples, much of the document points to the MIPS fronted as an example. This should be a good thing – MIPS is a simple design familiar to almost anyone in computing, but the MIPS backend is broken, in apparently fairly fundamental ways. The register descriptions are buggy in ways which cause it to generate incorrect code for any input containing floats, and it appears to be a high-level design problem descended from naively trusting that the register description and allocation mechanisms work as described. It is probably already a good tool for writing production compilers for established designs, but it really isn’t suited to writing research compilers.