Amtrak Rolls Past Containers into a Serverless Infrastructure
Serverless sufficiently lowers the barrier to entry so that companies that are currently on bare metal or virtual machines can leapfrog over container-based architectures altogether and straight into functions-as-a-service and event-driven models, asserted Gary Arora, who is a cloud strategy and technology consultant at Deloitte.
Speaking at the recent Serverlessconf conference in New York, Arora shared a case study in which he led the Amtrak — the U.S.’s passenger railroad service — through an adoption of serverless, from legacy systems. He said that working with a team of (at most) 15 Amtrak staff, major data systems were upgraded to enable real-time dashboards, with the entire project from initial project discussions to production rollout occurring within six months.
Amtrak’s challenges will be familiar to anyone who has stood at Penn Station until two minutes before their train is due to leave, still waiting for their platform to be announced. And the confusion is not just limited to New York. Other Amtrak rail hubs have similarly archaic systems which mean passengers are updated only at the last minute as to where they need to be to board their service.
Amtrak’s legacy databases could respond from a few seconds to an hour later, and analytics and BI dashboards governing the network were refreshed daily, meaning data coming into the reports was always one day stale. This impacted on Amtrak’s ability to optimize train carriages based on passenger load, could not initiate marketing campaigns to sell excess vacant seats, or to coordinate platform arrivals and departures.
As is a common case with monoliths in the enterprise (and in startups who built up quickly with a single code base), existing systems were weighed down with technical debt and new features were often assessed solely in terms of their expense to implement rather than the customer-value they could generate, which meant that the majority of new feature proposals were rejected.
And with their legacy databases being built up over time, Amtrak had also long since lost a single source of truth in its data. Data redundancy, lack of consistency and non-standardized data storage were all common problems.
Adoption of a Serverless Tech Stack
When looking at the opportunities to move to a serverless approach to drive data ingestion and reporting into a real-time dashboard of current train operations, the first task was to choose a database design and database.
Amtrak’s team assessed relational databases (with which existing developers were more familiar with) and NoSQL, which, while needing new internal expertise, was considered more appropriate as it allowed a flexible data schema, so was future-proof. But instead of going down the typical MEAN stack model for NoSQL, Amtrak chose Amazon Web Services’ DynamoDB NoSQL service, over MongoDB and Couchbase.
Arora says with these two decisions made, it was now possible to map the desired architecture for a serverless data flow:
The first task for the team was to normalize the data. “All the data comes into DynamoDB, goes through Kinesis and is stored in RedShift,” said Arora. This helped create a single source of data truth going forward and was one of the largest work components involved in the serverless migration. “Data migration is a beast with lots of refactoring,” he said.
With the move towards using a variety of AWS components alongside Lambda, AWS’ serverless service, Amtrak was able to create global booking dashboards in real-time. “They are now able to do predictive and dynamic pricing models, can manage carriage demand and ensure supply optimization,” said Arora.
The value delivered from the move to serverless included:
- Ability to process up to one million transactions a day (at a peak rate of 2,000 transactions per minute), and feed into near-real-time reporting.
- Reduced maintenance costs with no load balancing or server maintenance, and the ability to decommission databases as all data now ingested via DynamoDB (although, Arora is quick to caveat: “There is still DevOps work involved”).
- Improved data accuracy with a single source of truth, a future-proofed data schema and ease of data entry via JSON RESTful services.
Despite the benefits, Arora was also quick to point to the limitations of serverless which must be understood in order to ensure expectations from any migration are not over-inflated:
- If function execution is unsuccessful when called, there is an exponential back-off algorithm in place for retries, so that the function will try again in two seconds, and if that still doesn’t work, then it will try again in four seconds, then eight and so on.
- Importantly for systems such as this IoT backend-type use case, identity and access management using lambdas is separated from existing IAM security policies, so while it may be useful to use god mode in any proofs of concept, it is vital that IAM roles are clearly defined in any production use cases to ensure the security of the architecture.
For enterprise leaders looking to implement similar solutions, Arora said the most significant effort that needs to be made is to encourage a change of mindset amongst business stakeholders, who are often reluctant and cautious of the potential capital investment that may be required. Arora suggested starting small and building “heaps of prototypes” to demonstrate the value that can be created from the migration.
Wherever possible, a serverless architecture should reuse and integrate with existing tools as much as possible. In Amtrak’s case, this meant leveraging the transport provider’s microservices engine and their existing BI tools (Tableau) to build the dashboards.
While not suited to long-running and complex computational tasks within the enterprise, or where significant RAM is required during compute, Arora urged enterprises to consider serverless opportunities especially for web applications, mobile and IoT backend projects, and real-time analytics and data processing.
Feature image: Amtrak.