Cloud Native / Cloud Services / Edge / IoT

AWS Graviton Marks the Emergence of ARM for Cloud Native Workloads

8 Dec 2020 3:00am, by

When Amazon Web Services unveiled its AWS Graviton Arm processor in 2018, it was targeting loosely coupled scale-out workloads like web servers, log processing and caching with instances that appealed to customers like SmugMug who felt they were overpaying for premium compute when a smaller core would do the job. The Arm-based compute instances serves as an alternative to X86-based services used by the vast majority of the cloud giant’s customers.

In many ways, that was about priming the ecosystem, David Brown, Vice President of Amazon EC2 told the New Stack. “We wanted to signal to the ecosystem that Arm server chips were going to be real and we were going to be bringing them out.”

Getting the ecosystem ready meant Graviton2 could quickly power not just EC2 instances but increasing numbers of AWS services including Amazon RDS, Amazon ElastiCache (where Graviton2 is actually the default) and container services like Amazon EKS.

It’s only where customers are choosing a processor architecture that Graviton will be visible; in the “vast majority of services” customers will never see that the service is running on Arm, but he suggested that many will be.

Moving AWS services to Arm has also helped get the ecosystem ready for customer workloads, Brown noted. “That’s giving us a lot of experience internally, about what customers would go through as well as making sure in the ecosystem that we reach out to the various players and getting things resolved with tools we can build and things we can do to help.”

Graviton2 is even available as a 1U Outposts server with 64 vCPUs, 128GiB memory, and 4TB of local NVMe storage for running Amazon EC2 and EKS locally (although AWS suggests them for restricted locations like cell towers rather than mainstream data centers). “We’ve heard from customers that they would love to have Graviton2 in Outposts, and in terms of supporting the ecosystem and making sure we can provide our customers with the hardware that they want when they need it, there’s nothing about the ARM processor that means it couldn’t be in an Outpost.”

AWS will also be adding more AWS Graviton2-based instances — the latest is the compute and network-intensive C6gn, to go with instances designed for large in-memory datasets or compute-intensive workloads, with and without local NVMe storage.

From Offload to Integration

Brown suggested thinking about the development of Graviton like hardware microservices that can be built more quickly than entire monolithic hardware systems. All cloud providers want to reduce the overhead of running cloud services on the same CPU they want to sell to customers to run virtual machines, but the AWS Nitro compute cores was about more than that efficiency, Brown said.

“A certain percentage of the processor, whatever it might be, is used by the virtualization stack and a whole lot of other management software that needs to run on the box. What we noticed was apart from reducing the efficiency- because you’re using 10 to 20% of the core that you could be giving to the customer for your own work and for the hypervisor — we also weren’t able to get the sort of performance we believed our customers were looking for. We struggled with excursions in latency, excursions in CPU processing time where there was jitter in the workflow — and jitter is always a bad thing.”

AWS used Nitro to offload more and more of the overhead to a separate processor to improve performance and make it more consistent. “The first thing we did was remove networking and offload it to an Arm processor. We offloaded all of our management control plane and ultimately in 2017, we got to a point where we use zero percent of that CPU.”

Because of the way Arm licenses IP rather than selling processors, Arm SoCs are often custom packages of CPU, GPU and accelerators and Amazon added a lot of AWS-specific functionality.

“About a third of Graviton2 is the Arm core and the rest of that is custom silicon we designed,” Brown said. That includes encrypting memory and improving memory performance. “A lot of it is making sure of error correction and making sure the cores are really, really correct and can deal with memory modules that may not be working properly and protect customer workloads from those sorts of failures.”

Cores for CPU-bound Apps

Having started as a mobile chip, Arm is often seen as a way to reduce power and the high core count makes Arm systems good for parallelization, but recent generations of Arm also offer powerful cores. That combination delivers a price-performance ratio that AWS is focusing on.

“The Arm core consumes about half the power that you get from alternative cores. Customers have seen about a 20% improvement in raw performance; if you run in the same benchmark on the Graviton2 and the latest Intel processor, Graviton2 is about 20% faster,” Brown said. It is also about 20% cheaper. As a result, you get about a 40% price-performance improvement, Brown argued.

“We’ve seen customers get big performance benefits with the latest versions of Java that have great ARM64 support. We’ve seen customers get that benefit with database applications as well. So it really is a wide spectrum of anything that’s utilizing the cores only seeing that price-performance benefit,” he said.

Customers often start using Graviton for their application tier before moving more of their stack, Brown said. “It doesn’t have to be massively compute-intensive; it’s just the most general-purpose workloads that are CPU bound and would use more CPU if it was available. And a lot of the cloud native [workloads], whether it’s native running on EC2 or running on one of our container stacks, will see that significant performance benefit.”

That could cut costs on other VMs too. Many IaaS instances are extremely underutilized, Corey Quinn, chief cloud economist at The Duckbill Group, told the New Stack. “If I take a look at large, well-run fleets at large scale customers, and we are talking tens of thousands of nodes or more in some cases, the CPU utilization on these things is laughable: single-digit percentiles on average.”

Observability services provider Honeycomb was attracted by the promise of cheaper and more performant instances. “We’re a SaaS company and in particular we’re an infrastructure SaaS company so a large part of our operating expenses is our AWS bill, Principal Developer Advocate Liz Fong-Jones told the New Stack.
Switching to Graviton2 let the company engineers run the same ingress workload (receiving JSON objects and putting them into Kafka) on 40 instances rather than the 70 it had required on X64, with no noticeable change in performance. They were also able to move to slightly higher CPU utilization (50-60% rather than the 45% they’d set previously) without running the risk of saturating the CPU when usage spikes. Fong-Jones speculated that’s because Graviton2 cores aren’t hyperthreaded; “every core is a real core, it’s not shared execution units.”

Honeycomb also moved a query engine workload (which is both IO and CPU bound and needs NVMe storage) from storage-optimized i3 instances on Intel Xeon to Graviton2 M6GD for better performance even though the cost was around 10% higher. “It’s literally twice as fast,” she said.

The company is continuing to move workloads to Arm; the front-end serving infrastructure (which isn’t CPU intensive) is now running on Graviton1 and Fong-Jones plans to switch their Kafka workload, purely for cost reasons. For persistent workloads, the company uses a Compute Savings Plan and she expects to be able to commit to a lower level of spending, thanks to the experiments with Graviton2.

What Is Arm Ready for?

As the developer community has started evaluating Apple’s new Arm-based Macs, there have been questions about which tools are available on ARM64 clients, but the ARM64 server software ecosystem is broadly ready and some migrations can be done in a matter of days or weeks, Brown suggested.

“For the most part, if people are building applications there’s a large chance that the runtimes of those applications are built for ARM processors. Python or Java runtimes, for instance, offer runtimes that support ARM, “Gartner research director Raj Bala told the New Stack. “The challenge would involve frameworks that may make use of processor-specific capabilities that are not available on ARM. [Commercial, off-the-shelf] applications will probably be the biggest challenge — Or perhaps Docker containers that are not built with ARM support.”

Internal code is definitely less likely to be ready, Fong-Jones told us. “You can get an Ubuntu image that will just run and that’s been pretty huge. Similarly, it’s not just Java and PHP which are interpreted languages, but golang. We couldn’t have done this without golang supporting ARM64 out of the box and supporting cross-compilation out of the box.”

But that was the easy part, she said. “We had to do a little prep work to verify we didn’t have any hard dependencies on x86 assembly. We do have some rather optimized parts of our storage and there were bits that used AMD64 assembly but they had regular Go equivalents. But the challenge was all of the underlying system architecture.”

Honeycomb’s security intrusion system is built on osquery, which only very recently introduced ARM64 support (thanks in part to Honeycomb’s sponsorship of the port). “Our security auditors would be very upset with us if we deployed a bunch of these servers and didn’t have appropriate intrusion detection on them.”

The company relies on the Chef configuration management software. Chef 15 has ARM64 support but Honeycomb uses Chef 13. “Chef 13, at least as supplied by Chef Inc, is not built for ARM64. So we had to backport some Ubuntu packages and then switch the architecture bit. There’s stuff where you have to rejigger things because, if you’re running older LTS software, some of it’s not inherently built for Arm; you have to backport it.”

Fong-Jones has investigated switching MySQL to Arm for performance, “because database scaling for various kinds of metadata is a bottleneck.” But again, AWS supports MySQL 8, MariaDB 10 and the latest version of PostgreSQL; “We’re on MySQL 5.6 and our semantics don’t let us upgrade directly to MySQL 8. We could go to MariaDB 10 but AWS doesn’t let you do that with zero downtime because it’s considered a different database engine.”

The devil is, as always, in the detail, “It’s all those random weird system-level dependencies that you have to really, really get nailed down. Getting the code to run — that’s easy!”

Those mismatches are a reason not everyone is jumping onto Graviton yet, or even the latest Intel instance if a necessary driver hasn’t been ported for Nitro, Quinn suggested. “In many environments, it will require an OS update; that, in turn, means you need to recertify and migrate existing workloads and companies have always been slow to do that.” The same problem holds back the adoption of other technologies like autoscaling. “They haven’t updated their application architecture to be able to scale dynamically without reconfiguring everything when a node joins.”

“With a database, moving to a newer version is a little bit more challenging when the older version doesn’t support Arm and it’s not going to be supporting Arm,” Brown noted.

“That’s where we think the customer needs to weigh up the investment. 40% price-performance improvement doesn’t come along very often in the industry.
“Try and benchmark your workload as cheaply and quickly as you can,” he suggested. “Then you can invest in doing a migration, like moving to a different database version or rewriting a kernel module that’s processor-specific, because the 40% price performance justifies it.”

Are Customers Ready for Arm?

With new instance types, there are sometimes questions of availability; Fong-Jones had to pause the query engine migration for 12 hours because there weren’t enough M6GD instances and to get spot instances she had to accept M6 instances (which have more memory and a slightly higher clock speed) as well as the C6 instances that are all the workload requires.

Availability concerns might be a reason for AWS customers to be cautious about moving to Graviton, Quinn agreed, but they should be short-lived. “Can I get the capacity in whatever AWS region to run this at scale? What’s the spot availability story on this, can they get them quickly enough? And the answer is clearly yes, now they have them in sufficient volume to start offering them as managed services.”

But Graviton may not be an early choice or the most significant way that organizations can lower their AWS bills, Quinn noted. “It’s good housekeeping, it’s absolutely a benefit but it’s also not the most impactful thing they could do.”

“There’s somewhat low awareness of Graviton and Nitro amongst our customer base,” Bala confirmed. “When they do call, it’s typically about how to think about it relative to traditional processors.” His answer to that is that “customers should be thinking about the compatibility of workload end-to-end and the price/performance ratio when thinking about ARM vs. x86.”

Arm Beyond AWS

AWS won’t be the only option for Arm servers in the cloud, but so far it’s the most significant.

Oracle Cloud Infrastructure will have Arm systems (using silicon from Ampere with two sockets and 160 cores) available as VMs or bare metal early in 2021, for infrastructure tasks like transcoding as well as for running containers and Kubernetes.

Microsoft has been building Windows Server systems with Arm processors for Azure since 2017. The Project Olympus OCP chassis used in Azure can accommodate different Intel or Arm motherboards, and Microsoft has tested Windows Server running on Arm silicon including Ampere Altra, Fujitsu and Marvell ThunderX2.

At that time, the majority of cloud workloads were Infrastructure-as-a-Service and with few of the applications that customers wanted to virtualize available for the Arm architecture, Microsoft had to no plans to offer Windows Server on Arm publicly.

Instead, the ThunderX2 Arm servers that are now running in Azure have been used to bring down the cost of running internal Azure infrastructures like storage, indexing and search. At this year’s Arm DevSummit technical fellow Arun Kishan (who works on the Windows kernel team) noted “today, we are using Windows Server on ARM64 for exploring Microsoft internal storage and VM hosting services.”

And for developers, having both Windows and macOS available on Arm silicon might be the tipping point for wider adoption of Arm servers in the future, Bala suggested. “Developers are going to want to develop and build iteratively on their local machines before pushing to CI/CD platforms that eventually push to prod.”

A newsletter digest of the week’s most important stories & analyses.