Event Sourcing Design with Amazon Web Services
Event sourcing is a software architecture concept based on the idea of saving every state change to your application, giving you the ability to rebuild the application state from scratch using event playback. It’s similar to a bank ledger where instead of storing the current value of each account at any given time and updating those values, you store each transaction (event), and the value in the account is just the net sum of these events. While event sourcing adds extra complexity to the app architecture and can be overkill for some use cases, it can also add a lot of value.
In this article, I will explore the use cases for event sourcing, some of the considerations around this architecture, and offer an event sourcing design that leverages Amazon Web Services (AWS).
Is Event Sourcing Right for You?
Before we go over the use cases for event sourcing, I believe it’s important to outline the main advantages that this architecture presents. It boils down to three core functions:
- Complete Rebuild: Using the event log to completely rebuild the application state from events.
- Event Replay: Replaying events in the system with updated application logic to correct wrong processing of events.
- Event Reversal: Reversing specific events without having to do a replay from a clean application state.
Essentially, event sourcing reveals a new direction where we no longer need to persist the application state, but rather can derive it. So, instead of simply knowing the current state of your application you also have all the additional context of how you got there. This comes with many advantages including the ability to create test environments from the event log, fix bugs in the code, and then replay to correct the data and trace exactly how the account got into a compromised state.
Handling External Systems and Code Changes
As mentioned earlier, event sourcing comes with certain architectural complexities. Depending on the way you deploy it, event sourcing can potentially lead to higher disk space and memory usage and sometimes even longer boot-up times. Some of the trickier elements when building an event-sourced application include setting up interactions with external systems and handling code changes — both add a layer of complexity and require a specific approach.
External systems that are not designed for event sourcing can behave in unintended ways when receiving duplicate update messages during replays. This happens because the external systems don’t know the difference between real processing and replays. You can handle that by wrapping interactions with external systems in a gateway, which is probably a good idea in any case. The gateway should incorporate logic about the external system and not forward events during replays.
The other issue with external systems comes from handling external queries when executing event replays. If, for any event, you query an external system that does not support time-based queries, then the external data will be wrong. You’ll get the current state of the external data, not the state when the original event was processed. One way to avoid this is to log the response of all external queries during event replays, which ensures that your gateway to the external system uses the logged value to accurately represent the interaction.
Event sourcing reveals a new direction where we no longer need to persist the application state, but rather can derive it.
Another thing that you need to be prepared for is code changes. Generally, there are two main types of code changes that can affect reprocessing of events: new features and bug fixes. When you’re executing event replays with new features you will want to disable external gateways. This will prevent external systems from being updated for features that did not exist at the time the event was generated. The only exception to that is if the new features involve these gateways, in which case you’ll want to consider putting some special handling in for the first reprocess of the old events.
When handling bug fixes, it’s typically best to deploy the code fix and reprocess the events. This is straightforward for events that don’t have any external interactions and one of the main benefits of event sourcing. With external systems, the gateway needs to know if it can simply send the event processed by the fixed code or if there is a difference that needs to be computed before the external system is called. The difference is necessary to reverse the original buggy event that had previously gone out to the external system. Any time-sensitive logic, such as doing one thing before a certain date and a different thing after the date, will need to be included in the domain model of the event. Time-sensitive logic can get messy quickly so you should try to avoid it when possible.
Leveraging AWS for Event Sourcing
While the additional complexity introduced by event sourcing can be somewhat daunting, utilizing AWS services such as Kinesis, API Gateway, and DynamoDB can help streamline development. Here is a quick overview of a sample AWS event sourcing design.
The first step is to set up API Gateway to receive incoming requests on a web-facing URL and configure it to forward requests to an AWS Lambda — which will be used to load incoming requests from the API Gateway into the Kinesis event stream. We also need to set up Pump Lambda to receive events from the incoming stream and put them in the Event Store DB, and Write Lambda to receive events from the stream and store the latest state in the Data Store DB. Any business logic can be applied here before writing to the DB just like in a regular application. Playback Lambda can be triggered manually to read all or a subset of events from the events table and send them to the Kinesis Playback Stream. The reason this does not write directly to the data store is so you can later attach additional subscribers to the Kinesis Playback stream as your application needs evolve. Microservice Lambda will be used to contain your application’s business logic and process the incoming event.
To store the events, we’ll use DynamoDB. We’ll have two tables per service: events table and data store. The events table (just like the name suggests) will store every single event that is submitted to the system. This will act as an immutable system of record for the application. Data store, on the other hand, will store the latest state for quick access by the application. This is not the system of record and can be wiped and rebuilt from the event table when necessary.
Hopefully, this can help put things in perspective and show that event sourcing does not need to be as intimidating as it may sound. While there are some inherent complexities that come with this architecture, it can be extremely valuable in the use cases where the traceability of changes is paramount.