Data / Development / Contributed

Thanks for the Memories: The Power of Event Sourcing

29 Mar 2021 1:11pm, by

Milen Dyankov
Milen Dyankov is a Developer Advocate at AxonIQ on a mission to help fellow Java developers around the globe design and build clean, modular and future proof software! After more than 15 years developing, designing and consulting on various solutions for leading European companies, he currently spends most of his time supporting communities and organizations, speaking at conferences all over the world and researching his favorite topics around Java modularity, μservices, distributed systems architecture and software craftsmanship.

There are a lot of books, articles, talks, blogs, videos about Domain-Driven Design (DDD), Command-Query Responsibility Segregation (CQRS) and Event Sourcing (ES). Those three concepts beautifully supplement each other and therefore almost every resource covering one of them will at least mention the other two. This article is not an exception but it will try to approach things somewhat differently. Instead of diagrams and oversimplified code samples, it will give you some analogies and help you build some useful mental models to better understand the applicability of the above concepts (particularly ES).

How about a good movie reference for a start? Take the psychological thriller “Memento” in which a man struggles to find his wife’s murderer. The tricky part is, he has a serious medical condition – he is unable to form new memories. The condition is in fact not entirely fiction. Back in 1953, Henry Molaison became unable to form new memories as a result of brain surgery that was supposed to help in controlling his epilepsy.

Think about that for a second. Imagine you know everything that is essential for you to properly function but have no memories whatsoever (including written ones) of the last days, weeks, months, years. Think about how it will impact your ability to perform the tasks you are supposed to perform on a daily basis. Being unable to form new memories, one must rely solely on state.

That means making decisions based on the actual state of self and the current state of the surrounding environment as of now. It’s impossible to tell how the state(s) came to be. The only thing known is what’s observable in that very moment.

Let’s consider a simple example of choosing a seat in a plane. You have the plane’s map with all the seats, each marked as either taken or available. That’s the input data for the task. Unless you deliberately and persistently expanded and updated your state over time (by writing down relevant notes for example), it does not contain any more information relevant to the context. Nevertheless, the input data is sufficient to execute the “pick a seat” action. The lack of memories do not make the task impossible to accomplish.

What is impossible, however (without memories) is to use the results of previous experiences to influence the decision (“don’t take a seat next to the lavatory,” “avoid middle seats,” “prefer window over alley,” for example). Those quality enrichments of the decision-making process are only available by analyzing the memories.

A workaround to the inability to form memories could be to constantly expand and update the state. However, such an approach has one significant limitation. It requires you to know now what decisions you will need to make in the future (perhaps in different contexts) so you can collect all relevant information now and update the state accordingly. Whatever piece of information you skip now is lost forever.

Should Software Systems Have Memories?

We humans are driven by emotions and stories, like those above, that make us try to imagine how we would feel. But it is not about feelings. It’s about data processing capabilities and limitations, decision-making constraints, and how state and memories (or logs — to use more techie words) complement each other. Leave emotions aside for a moment, focus solely on the nuts and bolts and you will quickly realize how this is relevant to software systems.

It seems a whole lot of applications only need to know the current state to function properly. And that’s how they are designed. Take a simple product catalog system for example. It does a great job of informing you what products are available now and showing you all the details about them. It is not concerned about how the products got in the catalog or how and when were modified (unless there are auditing requirements in place). The catalog system may be more complex than merely a proxy to the data. It could have specific business logic (filter products relevant to user’s gender, location and current weather) or even update itself based on some rules (if product A and B are available then C is also available). It can have a lot more advanced behavior and still be solely based on its current state. In fact, many far more complex systems are built this way and so they need to maintain large states (in the form of hundreds of interconnected tables in a relational database for example).

Systems like the above-mentioned product catalog are incapable of forming new memories. But they (or rather their authors) don’t see that fact as a limitation or a constraint. At least not until someone asks — what products were available last week? No matter how you modify the system it will not be able to answer that question without memories.

The sad irony here is that by the time you realize you need memories, it’s already too late to have them. If that example convinces you your system would benefit from having memories you are most likely now thinking what the cost and effort of adding it would be. Hold on to that thought.

Not the Memories You Are Looking for

Whether you think about it this way or not, actually most software systems do have some memories. We typically call them logs. They are essentially long series of past events stored in files, databases, dedicated tools, etc. They contain information about things that have happened in the system in the past that were significant in one way or another. It’s hard to imagine a modern software system that does not have some sort of logging system. Logging things is typically super boring to developers, some extra work for maintainers, often disregarded by stakeholders but turns invaluable when things go wrong.

Isn’t that a surprise? Most systems have memories … such that are of no use to themselves. Let that sink in. Picture yourself unable to form memories, carrying a recording device and recording notes about significant events. Now realize you can’t play those recordings back. They only exist for the sole purpose of letting someone else evaluate your behavior. Again, abstract from how you would feel about it and focus on missed opportunities. Those notes could potentially help you make much better decisions. You’ve already made the effort of collecting them yet you can not make use of them. What’s worse, chances are no one else will make use of them until you misbehave.

Of course, not all software systems are designed this way. You may be surprised to know that relational databases — the somewhat iconic representatives of systems that are kind of obsessed with the consistency of the state — actually make a very good use of their own memories. Most databases maintain a transaction log that contains every single past event that resulted in any kind of state modification. That transaction log is not used at all while a client interacts with the database. But it’s invaluable in data replication and data restoring scenarios. It allows a database node to restore from any previous backup and build up the current state by reapplying all the events from the transaction log that happened after the backup was made.

State as a Function of Memories

Let’s now stretch that concept to what may initially seem like the other extreme. How about instead of constantly updating the state, we use the system’s memories (the transaction log) to (re-)build it from scratch every time we need it? Obviously, that’s not a very smart move for systems like relational databases. It’s not only that processing a bulk load of transactions and juggling terabytes of data in memory is a terrible idea. It’s more about the fact that relational databases have well-defined, known upfront and somewhat generic functional scope. Consistent state of the data they hold is their main concern and their memories of how that state came to be can hardly enrich normal operations in any significant way.

But what about other systems? Like the business-focused applications that a lot of us spend valuable time building. We already saw how memories can enhance decision-making and task completion. And that is what those business applications usually do — make decisions and complete tasks. Giving them the ability to form memories makes perfect sense, doesn’t it? But then, do they need to maintain state as well? Perhaps not since they can determine their state from their memories when they need to.

If the idea sounds ridiculous to you, don’t give up on it just yet. It’s a safe bet that most people who are happily applying it these days, once upon a time considered it ridiculous too. Instead, try to answer this question “How much money do you have in your wallet now?” If you are like most people, it’s unlikely that you have that figure in your mind. But you may recall that at some memorable point in time you had X and then at another memorable point in time you gained/spent Y and before you even know it you constructed the state of your wallet from a series of events.

It wasn’t hard at all, was it? Kindly notice how you automatically and subconsciously ignored a pile of memories gathered in the same timeframe but not relevant to the wallet context. Congratulations, you just discovered that your mind does Event Sourcing.

‘Context Is for Kings’

Try telling your team (without explaining Event Sourcing first) about your plan to constantly (re-)construct the state from past events and you will surely get them worried about your mental health. That’s because they’ll likely think of the state as a whole. Like the state of the entire application. Reconstructing that on-demand is no less scary than the idea of loading the entire database in memory.

In reality, unlike databases, most business applications don’t need the entire state to perform a task. In fact, often they need a tiny portion of it. And often that portion of the state can be (re-)constructed from a handful of events. Much like your brain did with the state of your wallet above. That’s where the concept of Aggregate (and Domain Driven Design in general) comes super handy. By definition, it’s a building block that groups together and encapsulates a collection of domain objects that must be treated as a whole with respect to state modifications in a given context. What this gives you are clear boundaries. You only need to (re-)generate the state of the Aggregate in order to perform the desired operation. The state of the rest of the system is irrelevant in this context.

Did the above paragraph put you in “Wait a minute! I have a pile of use cases where I need the entire state” fight mode? If so, chances are it’s querying the state that you have in mind, not modifying it. Sure, it’s a fundamental requirement of all systems to be able to provide cross-context data as requested. But no, there is no need to (re-)generate the entire state on every query. For most querying purposes a projection (or a subset if you prefer) of the state, containing the data someone may be interested in, is more than enough. This brings us to why the Command-Query Responsibility Segregation architectural pattern often appears in the picture. Explicitly separating commands from the queries allows application memories (the event log) to be the only source of truth. The very same events (re-)build aggregate’s state on-demand on the command side and update relevant projections on the query side.

The Price of the Priceless

Combining ES with DDD and CQRS not only gives your application memories — it also helps make the best use of them. So what does it take to do it? Implementing that from scratch may seem like a trivial task at first. It’s not. Storing and retrieving data seems trivial too but you wouldn’t attempt to implement your own database, would you? As always, the devil is the details.

Luckily you don’t have to. Tools and frameworks already exist to do the heavy lifting for you. For example, deeper exploration of the above concepts in the Java world, will inevitably lead you to Axon Framework — a free, open source, noninvasive framework providing all the essential tools for building DDD, CQRS, ES based applications. While it’s perfectly fine to store the memories in a database for a start, sooner or later developers also discover the power of Axon Server — a very efficient combination of event storage and a message bus. Those are just two of the most mature tools out there. Surely there are more. Explore, learn, experiment and see for yourself what they bring to the table. You may be surprised how low is the price to have priceless memories so one day your application can finally answer “How did we get here?” and all the other “non-important” questions of today.

Feature photo by Amy Humphries on Unsplash.

A newsletter digest of the week’s most important stories & analyses.