How MongoDB’s Atlas Helped Amadeus Reengineer a Crucial App
Amadeus, a travel technology company available in more than 190 countries that employs roughly 16,000 people, has a huge footprint in its industry. In 2019, the last “normal” year for the travel industry before the Covid-19 pandemic, the organization logged over 646 million bookings on its distribution platform.
But the company’s leaders know that standing still won’t help Amadeus keep pace with its competitors or keep its customers happy long term. It needs lower latency, bringing its data centers closer to its users. It wants greater resiliency, continuous innovation and a faster time to market for its latest features. And it wants tighter security, following zero trust principles.
All of which necessitates a move to the cloud.
Along that journey, the company needed to update a key legacy application, Amadeus Revenue Integrity (ARI). The monolithic app manages and protects Amadeus’s revenue by applying a set of rules in the post-processing stage for tickets booked through Amadeus, flagging and resolving mistakes like double bookings.
It also needed to make that update quickly, in a two-year time frame. ARI used a mix of MariaDB, Oracle and Elasticsearch, and has been deployed for nearly a decade at an on-premises data center in Arizona. The company wants to close down the data center by the beginning of 2023.
And it has been on that timetable that Amadeus turned to MongoDB’s Atlas, a fully-managed cloud database that handles the complexity of deploying, managing and healing deployments on the cloud.
“Atlas was necessary to fulfill the timeline constraints for the cloud migration,” Angelika Gross, section manager of MongoDB DevOps at Amadeus, confirmed during a presentation in June during MongoDB World 2022.
A Massive Project, a Tight Timeline
The tight, two-year timeline encompassed a huge-scale project: Amadeus runs more than 610 applications and 21,100 servers. The following diagram shows the percentage of applications that will be fast tracked to the cloud, and which will be decommissioned.
Since the applications and data included in the migration were part of the Amadeus ecosystem at large rather than its own stand-alone entity, it was using middleware and dependencies that also needed to move to the cloud. Security issues stemmed from that middleware and those dependencies, leading the team to change its operational model quite a bit.
Atlas is cloud native and production ready, but the standout characteristic here was its transparent scalability. Yet, Amadeus still went ahead and did its own performance testing and customization.
Amadeus has three Ops Manager deployments. They support 110 clusters spread across 550 nodes on MongoDB 4.2+, with 170 terabytes of data overall. The biggest cluster is 64 nodes with 170 shards, making up 48 TB of data. Its busiest cluster is nine nodes, 9 TB of data and handles roughly 20,000 queries, with 99% taking less than five milliseconds — and 35,000 updates, with 99% taking less than five milliseconds.
To ensure proper scaling and sizing for optimal query performance Amadeus performed injection testing with randomized production traffic samples and consulted with MongoDB engineers.
This resulted in a solid migration plan, with a month-by-month cluster scale-up — and the discovery of a major application bottleneck that was eating approximately 10 times the CPUs needed for a server action of that size.
The system included some tooling that was not fully using the SSL. Though it was functional, it wasn’t using the cloud.
“Amadeus made several modifications to ensure it was using the DNSserv connection scheme everywhere in order to be able to easily conceal the topology of the Atlas clusters,” said Florent Coquelet, lead of DevOps and SRE for cloud native apps at Amadeus, in a presentation at MongoDB World 2022.
Creating a Custom Solution With Atlas
Here are some of the customizations Amadeus made to tailor Atlas to its needs:
Data security is always important, but Amadeus is especially conscious of this, since it stores passenger flight data. The company works alongside Atlas’s file encryption to keep data safe.
Atlas provides file-level encryption while data is at rest and TLS v 1.2 while in transit. Amadeus takes it one step further by adding a custom Key Management and rotation of Key Identifier and Key Vault Credentials. Amadeus adds two-factor identification, logging authentication and audit filters.
Connectivity in the data center was never a big topic for the team at Amadeus, but moving to the cloud moves to an all-closed approach, meaning all types of connectivity need a request. This creates a new question: How to add an external provider to this landscape?
The following diagram illustrates the connection to the data center.
Amadeus decided a private link was the best way to connect to the database because it preferred outbound traffic over inbound as outbound traffic is easier to secure and control, according to Gross.
The only way to connect to the API to do all the automation was via the internet. Amadeus’s strategy to keep the traffic secure was to keep the IP access list incredibly limited.
Amadeus used the all-networks option to connect to the key vault, as it was the only option that worked. Though not yet available, Gross said that the travel company is looking forward to MongoDB providing it with private endpoint support. The active directory can be accessed via the internet and the database. Amadeus is exploring whether virtual network peering is needed, or if this can be done with DNS resolution.
Terraform is used for the infrastructure and Ansible for the configuration. MongoDB Atlas Terraform provider was used to create the Atlas cluster, hosted in Microsoft Azure, to create the private link and to configure the backup. Azure Terraform provider is used to create a private endpoint and to manage the storage account and the key vault.
Though the cloud migration is still very much in process, the Amadeus teams already have learned many valuable lessons.
When the company encountered a bottleneck in its CPU, it scaled the server up from an M30 to an M80 cluster, implemented code changes and found significant improvements, dropping response time from 250 milliseconds to 50 milliseconds.
When a bottleneck on IOPS was found, the team called up storage from 4TB to 16,000 IOPS and solved the issue.
A test production cycle is done before adding a new module to production. For example, teams back up, restore and shut down the cluster to see how the application behaves. The teams always get the application to perform before deploying.
The journey so far is off to a strong start, with Amadeus and MongoDB Atlas working together to deliver a blend of MongoDB Atlas’s out-of-the-box and custom cloud solutions.
“We were happy with the speed of delivery, the agility given by the solution and its capacity to integrate in our own operational ecosystem,” Luc Choubert, head of platform services at Amadeus Group, told the New Stack. “That was definitely a good way to successfully start our cloud journey!”