Why Literate Programming Might Help You Write Better Code
It seems obvious, but code isn’t only written for machines: it’s written for people, too. Overlooking this fact can cause problems.
“A friend of my dad’s told me about a time he came across some truly horrible code at work. The code was terrible: poorly written, badly commented,” Joël Franusic, solutions engineer at Okta, told The New Stack via email. The man was “so incensed that he decided to figure out who wrote the code so he could give that person a piece of his mind.”
Annoying, yes. But to cap it off: turned out it was his code.
This situation neatly encapsulates a common but painful experience for developers: making sense of someone else’s work (whether that someone is past you or an actual other person). While there are a few different ways of tackling these issues, there’s one that is particularly powerful, if often overlooked: literate programming.
What Is Literate Programming?
Literate programming is an approach to programming in which the code is explained using natural language alongside the source code. This is distinct from related practices such as documentation or code comments; there, the code is primary, with commentary and explanation being secondary. In literate programming, however, explanation has equal billing with the code itself.
“Documentation is fundamentally disconnected from the code,” Franusic noted. Often, “documentation is written by someone who doesn’t work on the code. This distance between code and documentation makes it harder to really understand what the code is doing.”
This underlines what makes literate programming particularly valuable: it’s a means of gaining greater transparency or clarity over code.
Having been developed in the early ‘80s by Donald Knuth, a computer scientist now professor emeritus at Stanford University, it would be easy to dismiss literate programming as a relic of a much earlier era of computing.
However, its emergence should act as a reminder that the challenges facing those pioneers of modern software engineering aren’t actually that different to today: everyone, after all, wants to write clean, clear code that people can understand and interact with at a later stage.
The fact that literate programming maintains an audience in a number of communities — such as statistics — is a clear indicator that it still has value. The only question is, are there other benefits engineers are missing out on?
The Evolution of a New Approach
Before getting to that, it’s worth looking at the world in which literate programming emerged. In the years after it was first introduced by Knuth, it attracted a small but committed community of engineers and computer scientists.
Norman Ramsey, a professor of computer science at Tufts University, told The New Stack how he found literate programming when he served as technical lead for a group of engineers who worked for a U.S. government contractor.
“We were contracted to the Air Force to deliver some verification software, and the mathematics behind what we were delivering was quite sophisticated,” he said. “ And even at the time, the standard for writing mathematics down was TeX.” (TeX is a typesetting system developed by Knuth a few years before literate programming.)
The complex mathematics at the heart of the project led Ramsey and his team to literate programming. This is because the very nature of literate programming allows you to “show your working out” in the text alongside the source code. In short, it helps you order your thinking and be clear in your methodology.
However, while WEB — Knuth’s first tool for literate programming — was written in Pascal, and CWEB (again, created by Knuth alongside mathematician Silvio Levy) in C, the project Ramsey and his team were delivering for the Air Force had to be written in Ada.
This led Ramsey to develop Spidery WEB, a tool that allowed engineers to “prettyprint’” — i.e., “make it look good when it’s typeset,” as Ramsey said — in languages beyond those that had previously been the focus of the literate programming community.
Ramsey’s work, then, helped to make literate programming more visible and accessible. He’s too modest to say this explicitly, but it’s interesting to hear him contrast his approach to tool building to Knuth’s. It was characteristic, he said, of Knuth that “when he built a tool, he built it to solve all the programs he had,” while by contrast, it was “utterly characteristic of my work to build the tools to allow people to do literate programming.”
After Spidery WEB, Ramsey developed noweb, possibly one of the most popular tools that allow people to do literate programming in a way that is independent of any particular programming language.
“Its primary advantages,” claims the text on the project website, “are simplicity, extensibility, and language-independence — especially noticeable when compared with other literate-programming tools.”
This story is an interesting snapshot of the way that engineering practices developed in what we might think of as the very early years of modern computer science: a diverse mix of personalities with different motivations and interests working together to uncover new ways of doing things.
We often tend to overlook this aspect of technological innovation today, but we should remember that it is precisely this mix of ingenuity and contextual luck — right place, right time — that drives change.
Literate Programming Today
Ramsey became less involved in literate programming in the ‘90s. He stopped contributing to the ecosystem of tools and ideas. This was due largely to his work as a researcher; he was focusing on new problems. But he also noted that a “relatively thriving Usenet newsgroup” for literate programming “gradually fell by the wayside” during that decade.
That said, it’s worth noting that a literate programming “purist” might take issue with attaching the label to these tools — they lack the level of control over the relationship between code and text needed to truly align with Knuth’s original vision.
However, purity aside, it’s clear that a range of different practices — typically rooted in collaboration and sharing — are benefitting from the ideas the likes of Knuth and Ramsey were working on in the eighties and nineties.
Franusic has his own story that can demonstrate its value in a relatively new domain —developer evangelism and relations.
While working as a developer evangelist at Twilio, the communication tools company, he found himself writing lots of blog posts that included code. “The particular issue that I found myself having, again and again, was how difficult it was to simultaneously write code and prose describing the code,” he wrote to The New Stack.
“My typical workflow when I write about code is as follows: 1. Write some code; 2. Start writing a blog post about the code; 3. Describe a part of the code using prose; 4. While describing the code with prose, discover ways to improve the code; 5. Fix the code, go to step 3; 6. Repeat until I’m done.
This process, he noted, was “pretty painful, because it necessarily requires keeping two documents synchronized, while they are both being written. It’s very frustrating to write a blog post and then discover that the code in the blog post and the code that is published on GitHub are out of sync!”
By using literate programming he was able to solve this issue; because code and narrative are in the same place, any syncing issues disappear.
While this is perhaps a bit of an edge case, it’s interesting to see how literate programming found new relevance in the context of a world of modern developer content and developer evangelism — something that would have been completely alien to research-oriented computer scientists in the ‘80s.
That said, however, while Ramsey said that literate programming is something “hidden” from students in his courses today, he noted that “an awful lot of the materials that we give them — particularly homework assignments — I prepare as literate programs.” This is because, he said, “I can guarantee that the code they’re seeing is the code that’s actually running.”
The parallels between Franusic writing and Ramsey’s teaching are worth noting. They demonstrate that literate programming can offer a useful way of not only organizing one’s ideas when it comes to writing code but also ensuring confidence that something is going to work for someone else.
Literate Statistical Programming
Perhaps the most common place in which you’ll find literate programming today is in statistics and data science.
More commonly described as literate statistical programming, the approach is particularly useful not just because of the complex mathematics that is part and parcel of statistical programming, but also because of the need to share code and clarify how a statistical program delivered its results. This also makes particular sense when you consider that statistical computing is a practice that frequently occurs in research settings.
Jupyter Notebooks, as mentioned above, is a particularly popular tool that might be seen as a recontextualization of literate programming. While you might not have realized — neither the project’s Wikipedia page nor its homepage makes any mention of literate programming — acknowledging the secret history of a tool that you might use every day is, in a small way, important.
It helps you recognize the hidden problems and ideas that contributed to its creation, and maybe even helps to build some form of solidarity with others. In turn, this can encourage greater self-reflection about engineering practices and approaches.
Moreover, it underlines the fact that they were made by people; they don’t come to us in the form of naturally-occurring tools and libraries.
Franusic directed The New Stack to one of his favorite Jupyter Notebook repositories, by Peter Norvig, a Stanford fellow and engineering director at Google. “If you haven’t seen these notebooks, I encourage you to look them over for an idea of how compelling a literate program can be.” It also makes you wonder how much Franusic knew about the upcoming release of Google’s new programming language, Carbon.
They really are; each one is a kind of narrative of problem-solving, combining coder and storytelling in a way that does more than just “show your work” (as your math teacher probably used to tell you). In the notebooks, Norvig is also able to explain why he does what he does. They are miniature dramas built upon the triumvirate of human, problem and code.
Why Isn’t Literate Programming More Widespread?
Although literate programming seems to have found a home in the world of statistical computing, Ramsey told The New Stack why it isn’t necessarily an approach that should be used everywhere.
“One of the things we learned is that it’s very expensive,” he said. “And so you spend time trying to figure out what sort of situations justify the expense. And one of them is where people are building relatively small, sophisticated kernels that are going to be shared widely.”
Franusic echoed this idea. “Once I’ve written a literate program, I’ve found it very hard to refactor, or make major changes to, that program. So with that in mind, I will only start writing a literate program after I am fairly certain that it’s as ‘done’ as I can make it.”
In other words, it’s simply not appropriate for many engineering contexts today, particularly those that stress speed and scale. Does this mean, then, that it has little to teach anyone writing code today outside of statistical and numerical programming?
Why Literate Programming Matters Today
Despite the clear disadvantages of the approach — its cost, its lack of mutability once a program is written — literate programming can be an incredibly powerful skill for software developers working in many different contexts. As Franusic’s experience writing blog posts at Twilio demonstrates, it can be helpful in allowing you to bring storytelling and code together in one place.
He’s particularly unequivocal about its value: “Learning literate programming will make you a better programmer. It will also help you write better code, which your future self will be very thankful for.”
Ramsey echoed this sentiment in more measured terms: “It really forces you to think before you code.”
The value of this enforced self-reflexivity can’t be understated. Yes, literate programming should help you to write better code, but it will also encourage you to think about why you’re doing what you’re doing. Why this and not that? Why did I do it that way and not that other way?
While you might not suddenly start using literate programming in day-to-day work, exploring a tool that can give you a fresh perspective on programming can only be a good thing.