Technology

Typesafe’s Jonas Bonér: How Reactive Programming Addresses the Scale-out Problem

29 Dec 2014 8:00am, by

As computer systems get bigger, we were once told, software simply scales with it, in an elegant balance of form and function.

No, it doesn’t. Software running under multiple processor cores needs considerable help in determining how it can divide itself into concurrent threads. If the developer doesn’t embed instructions on how to do this into the program, the processor itself looks for what opportunities for subdivision that it can. It is nowhere near an elegant process, and though it eventually yields performance gains, it is not without cost.

The Cost of Coherency

When virtualization enabled jobs to scale across multiple processors, pooled together in a dynamic cloud (if you don’t mind a mixed metaphor from two different parts of the ecology), entirely new systems emerged around a more nebulous concept of nodes that exchange data and work functions with other nodes. A kind of “meta-device” took shape, where entire servers were rendered the scaled-out version of transistors and logic circuits. But just as on an electronic device, for any of these exchanges to yield results without bottlenecks, they needed some degree of synchronicity.

This while the Internet was designed as an asynchronous system. The first Internet-scale systems started exhibiting unpredictability, weirdness, and random behaviors. One Facebook engineer dubbed the phenomenon metastability, after the strange behaviors witnessed in logic circuits back in the 1960s — behaviors caused by parts transitioning between 0 and 1 and back, but not quite either state.

Jonas Bonér believes there is an elegant solution to the problem of scaling up and scaling out. It is an architectural solution, and many would agree with Bonér about its aesthetic value. But it requires a fundamentally different comprehension of how programs work — specifically, the role of modularization and the division of functions.

Bonér’s company is Typesafe, and his personal contribution to it is Akka, a middleware layer that adds layers of abstraction between components written in Java, or in Typesafe’s own Scala. He is also the author of a short, but influential, document called The Reactive Manifesto: a call to action for a type of programming that is, for many, the antithesis of the principle of object-orientation.

It’s a way of thinking that flies in the face of a common, and in some quarters, uncontested belief that hardware alone is capable of scaling software, both up and out, to the extent that it’s practical to do so.

“The biggest challenge when it comes to scaling out,” remarks Bonér, in an interview with The New Stack, “is getting rid of contention. Together with the added coherency cost, contention is the big scalability killer.”

In trying to coordinate access to shared data or resources such as sockets or file handlers, he goes on, contention can take place on many levels.

“Coordinating access to mutable data requires you to have some sort of queue or mutual exclusion,” he continues, “when introduces wait time and queueing effects. If you can model your problem in a way that there is no coupling when it comes to the data, then you remove one of the biggest hurdles towards scalability.”

The integrity of the data being exchanged for any one point in time is referred to, in short, as the state. The cost incurred by the hardware to maintain that state, Bonér explains, is coherency — for example, when the CPU maintains the up-to-date status of the L3 cache, or when an interconnect such as QPI is used to exchange state across processors. When the developer is capable of isolating tasks — or, more to the point, when a framework such as Akka helps the developer to share less state data (preferably none at all) across processes, the coherency cost moves closer to zero.

“This means you can take the software and spread it out among multiple CPUs, cores, sockets, or machines,” Bonér says.

More Attention to Assertion

Last January, in an article for the Dice company blog, Espresso Logic CTO Val Huber demonstrated the conceptual difference between a database-driven program using the conventional procedural model, and using the Reactive model championed by Bonér and Typesafe. In piecing together a simple customer purchase order application, the relationships between the basic variables (product_price, qty_ordered, etc.) are spelled out on a “cocktail napkin.” In the procedural model, these relationships are enacted through a series of steps invoking conditional (if/then) logic.

But in the Reactive model, exactly as many instructions are required as there were illustrative relationships on the napkin — in Huber’s case, 5 — just with different syntax. The underlying framework makes it possible for these instructions to enforce the relationships whenever the values to which they refer change. These changes are the triggers, and the framework enables the reaction.

Arguments persist, however, regarding whether the Reactive model is but an extra layer of abstraction that hides a set of underlying procedures, essentially the same as the conditional logic but hidden. Such arguments are rendered moot, Huber and his compatriots claim, by the vastly increased degree of scalability this model enables. By stating the relationships between variables as “invariant” states — as “the way things are” — the Reactive model avoids what Bonér refers to as “coupling.”

The reason makes sense when you apply it to systems at Internet scale: The more coupling, or interrelationships between variables, there are in a program as it scales out, the greater the number of dependencies required to ensure the integrity of that data — to make sure every component’s view of the data is current and correct. Such dependencies can only be maintained through a series of steps executed in sequence, and it’s through these sequences that bottlenecks emerge.

“Think in terms of isolated units and sharing nothing,” requests Bonér, “meaning that each of these isolated units works only on its own data, and does not share data needlessly. Instead, it works on its own data, and when it has a result, it publishes that out to the world. If it publishes something that is immutable, that can’t be changed, then that’s safe to share without contention. But it does all its work on the mutable data, the in-process data, the in-flight data, in total isolation. Then you can scale these units independently as much as you want.”

If, on top of everything else, there is going to be an Internet of Things, then there will need to be a more scalable model for addressing those things than the purely sequential one we have now. A perfectly synchronous model, like you’d have for the circuits in a pocket calculator, won’t work. Whether we adopt an absolutely isolationist model like Reactive or not, the model we do adopt will probably look more like Reactive than our projections of the IoT today.

“In the future, all of these gadgets out there will need to embrace a fully message-based approach,” says Jonas Bonér, “because that’s the only way we’ll ever be able to serve all of them with a reasonable amount of hardware, and be cost-efficient. It might be okay now, but if we have 50 billion devices in just a few years, like they predict, there’s no way we can keep up with that with synchronous protocols.”


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.