Culture / Development

Why Erlang? Joe Armstrong’s Legacy of Fault-Tolerant Computing

28 Apr 2019 6:00am, by

The recent death of Joe Armstrong, one of the original designers of the Erlang programming language, has led to an outpouring of appreciation for his contributions to the state of computer science this week.

While Erlang grew out of work more than 33 years ago at the Swedish multinational telecom company Ericsson, its impact eventually spread throughout the world. Last June a presentation at the Code BEAM conference noted that every year Cisco ships about two million devices with Erlang, and that 90% of all internet traffic goes through Erlang controlled nodes.

What gives this language its special place in the hearts of developers? Armstrong’s passing provided a moment for the developer community to reflect on what was special about this particular language, and what was unique about Armstrong’s approach to language design.

Seeking a Better Way

The history is all hiding in plain sight. All 295 pages of Armstrong’s 2003 PhD thesis are online — titled “Making reliable distributed systems in the presence of software errors.” Submitted to the Royal Institute of Technology in Stockholm, it describes research that Armstrong began in 1981 seeking better ways to program telecom applications, which “has resulted in the development of a new programming language (called Erlang), together with a design methodology, and set of libraries for building robust systems (called OTP).”

“I argue how certain of the requirements necessary to build a fault-tolerant system are solved in the language, and others are solved in the standard libraries. Together these form a basis for building fault-tolerant software systems…”

 Michael Feathers on Joe Armstrong

68-year-old Armstrong looked back on those days this January — just three months before his death — in an interview with Erlang Solutions. “At the time, Ericsson built large telephone exchanges that had hundreds of thousands of users, and a key requirement in building these was that they should never go down. In other words, they had to be completely fault-tolerant.”

A solution would require the ability to run large numbers of parallel processes (whose state would not be affected by any errors in any of the others processes). Armstrong was originally trying to add concurrent processes to Prolog, as a way to provide programming for basic telephone services — but this eventually led to an entirely new language. Armstrong’s thesis describes Erlang as “a concurrent process-based language having strong isolation between concurrent processes,” as well as a “pure message passing language” making extensive use of “fail-fast processes.” He’d explored the SmallTalk language — but wasn’t happy with its concurrency or failure-handling, and from 1985-1989 began developing Erlang.

Soon Armstrong was joined by others — notably Robert Virding and Mike Williams — and it proved to be a productive collaboration. His thesis also acknowledges a collective body of work performed over the 22 years between 1981 and 2003. “The system as we know it today is the result of the collective effort of a large number of people. Without their talents and the feedback from our users Erlang would not be what it is today.”

Things began happening quickly. By 1990 Erlang was its own language (and not a dialect of Prolog), and soon the language was being used in Ericsson projects. In 1993 the first commercial version of Erlang was released, and Ericsson started a subsidiary to market Erlang to external customers. In 1995, as part of a large-scale switch project, a new supporting group was formed, during which the Erlang libraries were consolidated and renamed OTP (for “Open Telecom Platform”) with the hope of providing a stable core of software for all Erlang users. An earlier Ericsson attempt to build a next-generation switch had “collapsed” after an eight-year project (ending in 1995), so the project was then re-started using Erlang — and it was delivered in 1998.

But it was also that year that Ericsson made the decision to open source Erlang’s code. “Most Erlang/OTP users are still within Ericsson,” explains the company’s 1998 announcement. “In order to speed development of Erlang/OTP, ensure a good supply of Erlang/OTP fluent programmers, minimize maintenance and development costs for the language, and keep the OTP applications up to world-class, we need to spread the technology outside of Ericsson.”

Erlang also benefited from some fortunate timing, according to Armstrong’s January interview with Erlang Solutions. “Of course, when multicores came along, what we had done then mapped very well onto parallel programs,” Armstrong remembered. “Up to that point, concurrent programs were actually sequential programs that were interleaved rather quickly in an operating system. When multicores came along, the possibility emerged to execute that program in parallel. So we were immediately able to take advantage of parallel cores. And in fact, that’s probably the reason why Erlang has spread in the last 15 to 20 years — because of the way it scales naturally onto modern multicore computers.”

Armstrong himself once joked that “If Java is ‘write once, run anywhere’, then Erlang is ‘write once, run forever’.”

The thesis ultimately addresses itself to the fundamental need to program resilient systems. “Large systems will probably always be delivered containing a number of errors in the software, nevertheless such systems are expected to behave in a reasonable manner.”

An Ongoing Legacy

This all gives Erlang some unique strengths. “Erlang uses sets of parallel processes — not a single sequential process, as found in most programming languages…” explains the description for Armstrong’s own book, Programming Erlang — which adds that Erlang “will change your view of the world, and of how you program…”

Book cover - the pragmatic programmer“A multi-user game, web site, cloud application, or networked database can have thousands of users all interacting at the same time. You need a powerful, industrial-strength tool to handle the really hard problems inherent in parallel, concurrent environments.”

Some Erlang enthusiasts even trace what they love in the language back to Armstrong’s own commitment to simplicity. “Joe’s book was approachable, the same way he was,” remembered Fred Hebert, the author of Learn You Some Erlang for Great Good! in a heartfelt essay titled “Goodbye Joe.”

He could explain like no other what the principles of Erlang are, what led to its design, and the core approach to fault tolerance that it embodies. It’s one of the few language-specific books that is not content with getting you to write code, but lets you understand why you should write it that way. The language didn’t just have features because they were cool, it had features because they were needed for fault tolerance.

One of the amazing things Joe mentioned in his texts that was out of the ordinary compared to everything I had read before is that developers would make mistakes and we could not prevent them all. Instead, we had to be able to cope with them. He did not just tell you about a language, he launched you on a trail that taught you how to write entire systems.

But Joe’s advice and writing went further than this. He was always a huge fan of systematic simplicity, self-contained straightforwardness, and stories from the older times of computing that gave a fresh perspective on everything. Rather than thinking software is never finished, he wanted software that was so simple it could actually be completed.

Or, as one programmer posted on Hacker News in 2018, “Erlang, and after that Elixir, made me think differently about code and made me a better programmer.”

Erlang’s careful handling of concurrency has been a long-standing point of pride. In 2008 Robert Virding, one of Erlang’s co-creators, shared “Virding’s First Rule of Programming ” in a pithy blog post. “Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.”

Armstrong himself once joked in an interview with Rackspace in 2013 that “If Java is ‘write once, run anywhere’, then Erlang is ‘write once, run forever’.”

Who knows where his influence ended? Earlier this month web developer Kenny Bergquist wrote that “Every node in a kube cluster can run one or more pods, just the same way that every node in an Erlang cluster can run one or more processes… You can look at Kubernetes as a language-agnostic implementation of the same kind of distributed system that’s found in the Erlang environment.”

And this week programmers around the web were sharing stories about how Joe Armstrong’s spirit had touched their lives. New York City-based software engineer Thomas Gebert remembered early in his own career when he’d sent “some seriously noobey questions” to the man who’d started the Erlang project back in 1985. “Instead of a response like ‘Go read a book and stop bothering me’, he responded back with an incredibly long, well-written email explaining a lot of the minutia of how Erlang avoids a lot of pitfalls and generic concurrency theory,” Gebert remembered in a comment on Hacker News. “He was really good about explaining things in a way simple-enough for me to understand, without coming off as patronizing or rude.”

About a year later I got a job doing Erlang, and I sent him another email telling him this, saying something to the effect of “sorry for bothering you a year ago, but your email was really helpful to me.

His response was basically “You have nothing to apologize for! I’ve always thought it was important to help people asking questions, especially early in their career.”

And that’s how the values of a great concurrent programmer passed from one generation to the next.


WebReduce

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.