Bring Observability On Your Cloud Voyage — Or Go Home
New Relic sponsored this post.
As cloud passes through its early-adopter years to mainstream adoption among the world’s largest companies, development teams are being asked to plot a course for migrating critical applications to public and private cloud infrastructure.
Two-thirds of these companies form a type of cloud council or management committee to select the right priorities for migration, taking into account expected KPIs and ROI figures as measures of success.
While governance and metrics are generally good things, if stakeholders focus exclusively on outcomes, rather than awareness of what is needed to safely traverse the cloud journey, they may have already gotten off on the wrong foot.
As many as 74% of IT leaders in Global 500 companies say they have experienced cloud migration failures that caused them to return one or more applications back to their conventional data centers.
Where do cloud migration projects go wrong?
We’re witnessing a perennial migration — large enterprises porting monolithic applications to cloud — sometimes with unrealistic expectations. The innate elastic scalability of cloud compute and storage does not automatically accelerate successful deployments, nor does it answer performance, risk and cost concerns by itself.
Much like a conventional application, a cloud application is only as performant as its slowest service, only as secure as its weakest link and only as cost efficient as it was designed to be.
Perhaps the highest profile cloud failure to date was the launch of the first U.S. public healthcare exchange on healthcare.gov in October 2013. After three years of preparation, most commercial projects would consider 250,000 visitors looking to sign up on day one an unmitigated success. But in this case, the government agencies tasked with overseeing development lacked visibility into the impact of such traffic.
The healthcare.gov exchange quickly crumpled as it attempted to serve up health plan data, sourced from companies across 34 U.S. states and the signup process simply failed to work for most visitors for the first few days. Disgruntled applicants were turned away and the cloud-based site became the butt of many jokes from comics and political opponents.
Over the next few months, the teams first had to overcome organizational barriers — determining who would have oversight and responsibility for correcting the course of a project that was already more than $800 million over budget.
They managed to scale and test the signup process to allow more concurrency, increasing capacity to serve several thousand customers a day. Still, wouldn’t it have been better if observability was built in, before such problems were baked in and hard to untangle?
Measuring cloud migration success at Fleet Complete
For their part, cloud Infrastructure as a Service providers are well aware that only successful migrations will equal long-term increases in cloud adoption and spend.
Amazon Web Services (AWS) has long led the way in supplying business ROI and budgeting tools. Its migration acceleration program (MAP) methodology, for example, brings advisory, project and technical templates to the table along with a host of ecosystem partners to handle assurance, integration and management services.
Leading telematics provider Fleet Complete was seeking massive improvements in its ability to ingest, intelligently process and deliver logistics data for a rapidly growing on-the-road network.
“It’s not just about collecting data; it’s about layering that data to provide the best insights to customers,” Alan Fong, Fleet Complete Chief Technology Officer, said.
“From prior experience, I understood the breadth of the AWS tools and the global presence of the AWS platform,” Fong said. “AWS would allow us to use machine learning and artificial intelligence models atop our own IP to improve the information we deliver to customers.”
But before any KPIs could be planned or measured, observability needed to exist across all three phases of the migration map:
1. In planning, where the current application suite can be fully mapped for technology dependencies, baselined for performance characteristics and examined for vulnerabilities, in order to prioritize the migration of workloads that offer the best risk/reward potential and select from many integration and deployment options.
2. In progress, to ensure that each new code check-in, deployment and release is happening without unintended problems that would impact customers. In today’s fast-release DevOps world with ephemeral, constantly changing cloud native infrastructure, this is easier said than done.
3. After migration, where the delivered cloud application is not only monitored to rapidly detect and resolve potential issues, but its environment provisioning rightsized to control cloud costs with upgrades to maximize scalability and performance.
“Our DevOps team treats every release package that goes into production as a gift. We want to be able to run those packages in our staging environments with New Relic monitoring to ensure that what happens there will be representative of what occurs in our production environment,” Fong said. “That way, when a release is put into production using the same tool sets, there are no surprises; we know exactly how it’s going to run and exactly how it will affect the rest of our environment.”
In year one, Fleet Complete was able to transition 60% of its telematics environments to AWS, using a unified dashboard built on the New Relic One platform for pre-release checks, ALM, infrastructure monitoring and observability across both on-prem and cloud workloads.
Fueling the journey at World Fuel Services
A cloud and DevOps transformation is underway at World Fuel Services (WFS), a Fortune 100-scale global enterprise that was operating a suite of custom and acquired applications in no less than 20 data centers around the world, handling everything from trade logistics and supplier management to customer service and back-office functions.
The firm made a commitment to moving its application suite to Amazon Elastic Container Service (ECS), tapping Kinect Consulting as an integration service partner and New Relic Cloud Adoption Solution for AWS as an observability platform.
With so many different routes available, where would WFS start? Observability fueled all three phases of the cloud migration:
1. In planning, where teams captured a full inventory of the firm’s extensive application technology estate as it existed across the 20 data centers, including development languages, service and network dependencies and current performance profiles. The results were then prioritized by the platform as strategic recommendations for the applications to rationalize first in the roadmap for maximum rewards with minimum risk.
2. In progress, where Kinect developers automatically inject their CI/CD pipelines with New Relic agents, so monitoring and tracing instrumentation is built right into every application component they deploy in the data centers or in AWS. During implementation it was discovered that certain containers were causing excessive network traffic loads, so additional capacity was provisioned while teams isolated and refactored the problem in a later release.
3. After migration where an automated feedback loop tracks and reports exactly which features are being used by customers and even “near misses” of potential issues with third-party dependencies are rolled up into recommendations for continued improvements, as seven of the 20 global data centers were rationalized and retired in year one of the initiative.
“For our dispatch application, the websites and services from our suppliers must be running or it prevents our order from completing,” Sunith Ravindran, application development leader at Kinect, said. “We use New Relic to proactively monitor the status of our third-party integrations so that we not only know whether our infrastructure is up, but whether the services from our partners are up. There have already been instances where we’ve notified a supplier that their services were down.”
The Intellyx Take
An enterprise cloud migration that isn’t oriented toward improving customer experience will likely fail, no matter what tools are employed. There’s no point in migrating to cloud just to save money, nor for using cloud native approaches like Kubernetes or serverless just to geek out on the latest technology.
Today’s web and mobile users are notoriously unforgiving and expect low latency, few errors and no security flaws. Fail to meet their needs with a poor experience and customers may abandon the application for a competitive alternative app in seconds.
Avoid premature returns to the safety of old silos. Take observability along on the continuous journey to modern cloud architectures.
Amazon Web Services is a sponsor of The New Stack.
Feature image by Alexander Baxevanis from flickr open source.