Culture / Development / Technology

Rust Creator Graydon Hoare Recounts the History of Compilers

12 May 2019 3:00am, by

On March 26, Graydon Hoare, the original creator of the Rust programming language,  stopped in to speak about compilers to some lucky University of British Columbia students in the school’s introductory class to compiler construction.

With the aspiring compiler designers of tomorrow in mind, Hoare’s talk spanned the history of building compilers for programming languages (He didn’t record the talk, so we have the slides to go by). He told the students he wanted to demystify that space “between class projects and industrial compilers” to “reduce terror, spark curiosity, encourage trying it as a career.”

“If I can compiler, so can you!”

Young Graydon Hoare

His compiler show-and-tell began with clang, the compiler front-end developed for C, C++, Objective-C and Objective-C++. Hoare labeled it “specimen #1,” noting that using it involves two million lines of C++ code, of which 800,000 are for clang and another 1.2 million for the LLVM project — and that it’s maintained by a multi-organization team. “Good diagnostics, fast code… more permissively licensed than GCC.”

Then there’s the swiftc compiler — which involves 530,000 lines of C++ code augmented by 2 million more lines of clang and LLVM code. He later dedicated a slide to the LLVM tools and library, joking that it’s a “one-stop shop for compiler backends.”

About rustc, the Rust compiler, Hoare pointed out that it’s composed of 360,000 lines of Rust code (plus the 1.2 million lines of LLVM). He cites its maintaining organization as “originally mostly Mozilla,” adding humbly that “Yes, I did a lot of the initial bring-up so my name is attached to it forever; glad it worked out!”

Throughout the talk, each specimen was accompanied by a snippet of its source code — except the Turbo Pascal compiler. Because its source code is proprietary, he represented it with an old magazine ad.

The fourth specimen was the ever-popular GCC, which he pointed out is 2.2 million lines “of mostly C, C++. 600k lines Ada.” Dating back to 1987, the language is supported by a large multi-organization team, Hoare noted, adding that it “generates quite fast code.”

Balancing Costs and Benefits

“Compilers get big because the development costs are seen as justified by the benefits, at least to the people paying the bills,” Hoare explained, citing desired goals like better runtime performance and developer productivity (from things like diagnostics tools), as well as exploiting the capabilities of new hardware. The last bullet adds that some compilers are written in “verbose” languages “for all the usual reasons (compatibility, performance, familiarity).”

And the rest of the talk explores how those tradeoffs can be made, and if they should.

“In some contexts, ‘all the optimizations’ is too much,” explained one slide. “If you try to write a compiler performing every optimization, you’ll end up using too much memory or creating a compiler requiring far too much effort to develop and maintain — or that takes too long to compile!”

Hoare reminded the students of Proebsting’s Law, a sarcastic riff by University of Arizona computer science professor Todd A. Proebsting that posits  advances in compilers will double our computing power every 18 years — an eternity compared to the 18 months it takes for chip manufacturers to double the number of transistors on their processors (“Moore’s Law”).

Hoare’s own take? Proebsting’s Law is less true if a language has more abstractions to eliminate — but unfortunately, it’s truer for lower-level languages.

Wandering Through Weirder Landscapes

Hoare also examined the smaller (660,000 lines of code) V8, the just-in-time JavaScript compiler in both Chrome and Node, which he describes as “always adjusting for the sweet spot of runtime performance vs. compile time.”

The Chez Scheme compiler uses 27 different IRs (a compiler’s internal “intermediate representation” structures) but is just 87,000 lines. Hoare adds that it’s mostly a single-developer project — made possible by its relatively small codebase. And the compiler for Poly/ML (an implementation of machine language that supports multicore hardware) is just 44,000 lines. Eventually, his presentation arrived at the 184-line TREE-META metacompiler from a 1967 U.S. Air Force research project at the Stanford Research Institute’s Augmentation Research Lab.

The “wander through a weird landscape” continued, with Glasgow Haskell Compiler, Franz Lisp, Manx Aztec C, and 8cc. There’s CakeML, Roslyn, Pharo/Cog, and the Eclipse Compiler for Java. There’s a slide for the compiler for the “highly-influential” language Mesa (which he notes is one of his favorites) developed at Xerox PARC between 1976 and 1981.

And that led him to a discussion about how compilers interact with interpreters — and a quick history of computers.

It starts with the 1940s-era ENIAC, where “programming” actually involved re-wiring until a team lead by Jean Bartik began storing instructions in memory. 1949 saw the arrival of high-level pseudo codes with software interpreters, and soon Grace Hopper was converting pseudo-code directly into machine language for the UNIVAC with her A-0 System, which was the first compiler.

Hoare also reminded the students of the pioneering work of Frances E. Allen, whose 45-year career at IBM included work on the compiler-optimization team for IBM’s “Harvest” supercomputer, installed at the National Security Agency.

February 1962 image of IBM HARVEST computer

In the early 1970s she co-authored “A catalog of optimizing transformations,” with John Cooke, a paper that aimed to “systematize the potpourri of optimizing transformations that a compiler can make to a program,” describing these optimizations in detail:

  • Inline
  • Unroll (and vectorize)
  • CSE (common subexpression elimination)
  • DCE (dead code elimination)
  • Code Motion
  • Constant Fold
  • Peephole

Hoare added that many compilers do just these eight things and get about 80% of a best-case performance.

A Grand Finale

Hoare touched on metacompilers and discussed the tradeoffs of doing compilation versus interpretation with an appropriate quote from Xavier Leroy, a primary developer on OCaml. “As a cheap implementation device, bytecode interpreters offer 1/4 of the performance of optimizing native-code compilers at 1/20 of the implementation cost.”

He also includes a pithy observation about Truffle/Graal, an open source library for building interpreters. “Write an interpreter with some machinery to help the partial evaluator, get a compiler for free,” he said. Now being maintained by Oracle, Hoare calls it “seriously competitive! Potential future Oracle JVM.”

There are also compilers that only compile some functions, leaving the rest to be handled by the interpreter.

For his grand finale, he showed the audience JonesForth, one developer’s educational implementation of Forth with a 692-instruction virtual machine and 1,490 lines of Forth for its compiler, debugger, and read-eval-print loop. “Forth, like Lisp, is nearly virtual machine code at input,” he told the audience.

Hoare’s appreciation for language design is evident, and he left the students with an inspiring parting message. “There have been a lot of languages,” he said, citing the 8,945 identified by the Online Historical Encyclopaedia of Programming Languages dating all the way back to the 18th century.

“Go study them: past and present! Many compilers possible!” he urged the students. “Pick a future you like!”


A newsletter digest of the week’s most important stories & analyses.