Building Full-Stack Reactive Systems That Scale
For decades we’ve focused on the “components” of a computer-based system: front end, back end, database, servers. We’ve developed, deployed, and optimized each of these components individually. Unfortunately, in some organizations, wonderful and intentional user experience becomes a secondary concern to the components themselves. An innovative user experience will always suffer if the back end is built on top of massive database schemas, complex SQL joins, two-phase commits, and a whole host of other self-imposed limitations, along with antiquated processes that slow down or halt innovation. It’s safe to say that the most innovative user experiences of tomorrow will not be built on top of the CRUD systems of yesterday.
Companies that break the chains of the past the most efficiently will win; not only win market share through improved user experiences but also win the war for talent. Developers have never had more choice on where to spend their time. Full Stack Reactive means that everything from the way we persist data to the way we structure teams needs to adapt to an ever-changing environment.
Take the Spotify Engineering Model with Squads, Tribes, Chapters and Guilds. This is a great example of a company staying reactive on all levels, from technology to people. Or the “Inverse Conway Maneuver”, in which a company restructures itself around the ideal architecture of its systems. Companies that stay fluid and flexible already embrace some Reactive principles. But structure and process is only the beginning. We need to work backward from the user experiences that we wish to deliver and rethink the way we implement systems from the ground up in order to deliver those experiences. If you prefer podcasts to reading an article, I recently did a webinar with Lightbend on Full Stack Reactive Systems that you can listen to here. Full Stack Reactive Systems are based on the original principles of the Reactive Manifesto, published in 2014.
Event Storming, Process Modeling and Domain-Driven Design
What’s fascinating about reactive systems is how close they can become to the actual business itself. Take a business process like check clearing. Without a complete understanding of how check clearing works, and without input from business experts, technologists may try to model it as a 100% consistent and 100% reliable transaction. However, the business process embraces concepts such as eventual consistency (having access to cash before a check clears) and compensation from errors rather than prevention of errors (manual back-office processes helping to correct issues during clearing without throwing exceptions to the user).
By bringing technologists and business experts together through techniques like Event Storming and Domain-Driven Design, we give technology a full seat at the table, to both contribute ideas and to learn about the business itself. Only then can we start to realize the true competitive advantage that technology brings to forward-thinking companies. What’s really exciting is that companies outside of traditional technology hubs, like the Bay Area and New York City, have access to this competitive advantage as well, and are applying it to a diverse range of industries.
You can take any business, no matter how complex, and tell a story about that business by thinking about every single interesting event that happens within that business. An event is simply “something interesting that has happened in the past.” By thinking in terms of how events flow, rather than the structure or hierarchy of components, we have the ability to craft systems that closely resemble how the business actually works.
Now, how do we add structure to a flow of events so we can actually build a system based on an Event Storming exercise? In essence, we’ll have a delta between a flow of events and a collection of microservices (or other implementation details). Domain-driven design helps to bring a structure to this flow, moving us closer to a comprehensive blueprint that we can use to create software components. Concepts such as bounded contexts can help us to determine where microservice boundaries should be defined, and aggregate roots can point us towards how to handle state and state transitions within a microservice boundary.
Reactive Front Ends
We need to change our definition of what the front end is. Imagine what business productivity tools are going to look like when everybody has something like an Oculus Rift in their home office? If you’re thinking about building a backend that interacts with a diverse set of clients, from other systems, to mobile devices, to future interfaces such as VR headsets, are you really going to want an unwieldy, overly complicated relational database and monolithic back-end platform powering those experiences?
Consider a physidigital experience in e-commerce — perhaps I try on a piece of clothing in front of an augmented-reality powered mirror that overlays other color or fit options. How I interact with that mirror helps to train the mirror to make better recommendations. Those events are powering the user experience directly and also the core intelligence of the system — after all, events are also a key raw ingredient in machine learning. That data can be used to enhance the shopping experience, from guiding fresh interactions in-store, to pre-ordering new inventory based on store-level or global patterns in the events. When we rely on relational databases, and even worse, mutating data within the database (throwing away all of the state changelogs), we lose an incredible amount of data that could otherwise be used to infuse our systems with machine intelligence.
The closer that our “backend” systems are to speaking in real human terms, the more value we extract, both in a real-time user experience and also in machine intelligence opportunities. While Company A is running an idea through multiple layers of committees and approvals and coordinating dozens of teams to change APIs and database schemas up front, Company B will be experimenting with new features in production, and tweaking them multiple times per day using techniques and practices like A/B/n testing, CI/CD, feature flag-driven development, and predicting future events based on predictive analytics.
The pace of change in Company B will be even more rapid than it may seem at first blush. Company B also embraced event-driven, pubsub integration, so once events are published and available by default throughout the organization, new teams can be spun up without any up-front coordination costs like agreeing on an API spec. A new team can start to build an entirely new system simply by subscribing to real-time domain events. Think “pop-up shops” or “pop-up restaurants,” but being able to spin up a new team quickly due to certain architectural choices. This isn’t a mainstream practice yet, but within a few years, it will seem more obvious as high-performing teams embrace asynchronous pub-sub over synchronous integrations, not only within a single team but across a whole organization.
CQRS and Event Sourcing
How does this all work in practice? Event sourcing is an implementation technique that essentially eschews relational databases. As events are applied to stateful entities, current state is simply a summary of all past events. This view is typically precomputed (“projected”) to a read-side query store, such as a database, and kept up to date with each new event. What’s really exciting about CQRS (Command Query Responsibility Segregation) and event sourcing together is how well it maps to real-time systems. When we bring these techniques into a microservice-based architecture, we can use CQRS to completely separate out the “read side” from the “write side” of an application, and then optimize uniquely for each. This can help to radically improve the performance and reliability of each channel by eliminating unnecessary work, such as CPU-intensive SQL queries on the read-side channel, which can easily lead to minutes of latency. Instead, we can project a view of data from the write-side to the read-side. The only tradeoff is consistency (the view may not be completely up-to-date by the time it is queried) and additional complexity (maintaining a separate read and write channel).
This approach is not one size fits all, but for systems that deeply care about both resilience and performance, it is an excellent approach. Rather than make users endure spinners, lag, or error messages if things go wrong during a complex query, why not simply present them with the best projection of data that we already have computed right now? As long as the data is still relevant to them, then it’s effectively real-time, and can be presented without any risks to the user experience. In summary, CQRS enables read/write optimizations architecturally, while event sourcing is the core implementation of state and the raw data for projections. In essence, with event sourcing we always know what state an entity is in and we also know how it got there. When you combine these two techniques, we can build an entire system with very minimal relational database technologies, perhaps limiting their use to the store of projections so you can continue to leverage SQL on the read-side.
Kubernetes and Management of Containerized Applications
Building a reactive system from the ground up with technologies like Kubernetes makes the creation and management of container-managed systems much more accessible to a variety of teams. We don’t need to break the bank to achieve characteristics like the ability for our platform to scale-out on-demand, and then scale back in when the demand subsides. Striking that balance between cost-effectiveness and a great user experience isn’t as painful as it used to be a decade ago when we had to provision our own hardware, physical or virtual. We always had to overprovision resources and hope that usage never exceed the capacity. Those days are gone.
Reactivity needs to be on all levels. It needs to be at an organizational level. It needs to be all the way down to the server and all the way up to the user experience. Functional reactive programming and other implementation details make up a very small subset of Full Stack Reactive.
Lightbend is a sponsor of The New Stack.
Feature image by Michal Jarmoluk from Pixabay.