Amazon Web Services Takes the Silicon Wars to the Cloud

Amazon Web Services’ (AWS) Peter DeSantis, senior vice president of global infrastructure and customer support, during his keynote at the cloud giant’s AWS Re:Invent, annual user event asserted that AWS’ processor and chip designs are superior for cloud-application performance, compared to processors CPU giants AMD and Intel provide while adding that AWS’s graphics processor performance beats that of GPUs that Nvidia — a leading GPU supplier — offers for machine learning (ML). He was referring to AWS Graviton2 processors, which are custom built by Amazon Web Services using 64-bit Arm Neoverse
As the dust settles following the explosion of options available on AWS, as well as on Azure and Google Cloud Platform (GCP), customers will likely increasingly scrutinize application performance and the cost/performance ratio the services offer. The underlying chip and server infrastructure will, in this way, serve as a key factor determining cloud native application performance, power consumption and, of course, cost.
AWS Re:Invent SVP Peter DeSantis keynote: #Graviton2‘s benchmarks don’t matter if they don’t “capture real-world performance,” so making apps run cheaper and faster was emphasized. https://t.co/68nBdRGdnP #awsreinvent #aws @thenewstack
— BC Gain (@bcamerongain) December 10, 2020
The power of AWS’ flagship 64-bit ARM-based Graviton2 and other in-house-designed processors will thus play a major role in AWS’ hopes to allow customers to improve the performance of their application.
“What’s really exciting and transformative about deep investments in AWS silicon is being able to work across custom hardware and software to deliver unique capabilities,” DeSantis said. “And by working across this whole stack, we’re able to deliver these improvements faster than ever before.”
The Chip Factor
Graviton2 is expected to further improve application performance in a number of ways. Without citing Intel per se, DeSantis made the bold statement that AWS-design Graviton2 offers superior performance, power-savings advantages and security over traditional designs. AWS has also publicly stated that Graviton2 — which powers Amazon EC2 T4g, M6g, C6g and R6g instances, and “their variants” with local NVMe-based SSD storage — offers up to 40% better price performance over x86-based instances “for a wide variety of workloads.”
AWS Re:Invent SVP Peter DeSantis keynote: #Graviton2 is hand-down AWS’ flagship processor. It’s offers the fastest speeds, lowest-latency and lowest power consumption, DeSantis. https://t.co/68nBdRGdnP #awsreinvent #aws @thenewstack pic.twitter.com/0dQzRXp0W1
— BC Gain (@bcamerongain) December 10, 2020
Introduced with the Graviton-based Amazon EC2 A1 instances, Graviton’s original purpose for that instance was to allow AWS ”to work with our customers and our ISV partners to understand what they needed to run their workloads on a modern 64-bit ARM processor,” DeSantis said.
What users require today, DeSantis explained, is the capacity to match processor designs with highly distributed microservices applications running in cloud environments. Today’s developers are also largely no longer writing in C++ for cloud native applications — which DeSantis said he “grew up with” — but instead are writing code with Go and Rust and “have completely changed the game for high-performance multithreaded application development,” DeSantis said.
“To me, one of the most exciting trends is the move to services-based architectures, moving up from large monolithic applications to small purpose-built independence. This is exactly the type of computing that containers and Lambda enable,” DeSantis said. “And while scale-out computing has evolved to take advantage of higher-core processors, [processor] designers have never really abandoned the old world. They have tried to have it both ways, catering to both legacy applications and modern scale-out applications.”
WS SVP Peter DeSantis keynote: Low latency is critical and sometimes underestimated, although it is an especially key metric for cloud-application performance. https://t.co/68nBdRGdnP #awsreinvent #aws @thenewstack pic.twitter.com/pWy1bBl6Eo
— BC Gain (@bcamerongain) December 10, 2020
While reiterating that Graviton2’s designer “focused on making sure that each core delivered the most real-world performance for modern cloud workloads,” DeSantis also alluded to how traditional CPU-performance benchmarks, such as those used to gauge PC and server performance, are often no longer applicable. “When I say real-world performance, I mean better performance on actual workloads, not things that lead to better spec sheets, like processor frequency or performance microbenchmarks, which don’t capture real-world performance,” DeSantis said. “We used our experience running real scale-out applications to identify where we needed to add capabilities to assure optimal performance.”
DeSantis also said Graviton2’s design was intended to save silicon surface per chip while reducing power consumption by reducing the number of cores — a traditional measure of processor performance such as how horsepower is used to measure car engine power.
“We designed Graviton to fit with as many independent cores as possible — when I say ‘independent,’ Graviton two cores are designed to perform consistently,” DeSantis said. “Therefore you get no unexpected throttling — just consistent performance.”
Jerry Hunter, SPV Snap engineer (right), said besides relying on Dynamo and S3 to avoid building and managing infrastructure, #Graviton2 helps to improve user experience and to reduce costs and energy consumption. https://t.co/68nBdRGdnP #awsreinvent #aws @thenewstack pic.twitter.com/N8Jidr69OC
— BC Gain (@bcamerongain) December 10, 2020
In the case of Snap, Snapchat’s parent, Graviton2’s design lent itself to reduce costs and energy consumption for its use of AWS DynamoDB and S3, Jerry Hunter, senior vice president of engineering, for Snap said, speaking virtually during the keynote with DeSantis. Besides relying on AWS’ DynamoDB and S3 for storage to offset the obvious investments Snap would otherwise have to spend on data center infrastructure, Hunter said he has seen results Graviton2 offers that “reduce costs and create better performance for our customers with not a lot of energy.”
Hunter said he found Snap’s shift to Graviton2 to be “pretty straightforward,” while the APIs are “pretty similar to what we’re using before” and thus did not “take a lot for us to migrate our code over to test it out,” Hunter said. “We started trying it out with our customers to see it work and we liked the results. So, we rolled it out into the fleet and immediately got like a 20% savings, which is fantastic because we were able to switch this load over and immediately get cost savings and get higher performance.”
Machine Learning on a Chip
While he did not disclose a specific benchmark, DeSantis extended AWS-touted superior processor performance relative to Nvidia, the world’s largest graphics processor unit (GPU) maker for machine learning. Overall, the company’s own AWS Inferentia offers “the highest throughput at almost half the cost per inference” when compared to GPUs used for large-scale inference infrastructure that supports ML, DeSantis boasted. Specific to Nvidia, DeSantis said Amazon Alexa recently moved its inference workload from Nvidia GPU-based hardware Inferentia-based EC2 instances and saw reduced cost by 30% and 25% improvement in latency.
For ML developers, AWS’ Neuron team offers frameworks such as TensorFlow, PyTorch and Apache MXNet, to design applications that run on Inferentia. “Developers can take advantage of the cost savings and performance of Inferentia with little or no change to their ML code, all while maintaining support for other ML processors,” DeSantis said.
Without disclosing specific details, DeSantis said, AWS’ next silicon design for ML will consist of its AWS Trainium launch to debut next year.
“While we’re excited as all customers are seeing with their venture, our investments in machine learning chips are just beginning. What Inferentia has done for inference, Trainium will provide the lowest cost and highest performance way to run your training workloads,” DeSantis said. “I’m looking forward to showing you more technical details about training and next year.”
For ML developer teams, AWS is also scaling operations with machine learning and integrating AWS’ database services, including S3 and Dynamo, with AWS SageMaker and its ML infrastructure both internally and for its customers. With the right silicon infrastructure and development tools, the idea is to provide a machine learning platform — performance aside — that can meet the needs of DevOps teams as they might scale from 10 or 100 ML project models to perhaps 1,000s without changing the infrastructure and toolsets.
“This is such a transformational technology that if you don’t get started in machine learning now you’re not going to be able to transform your customer experiences as somebody that uses machine learning. And so I think it’s really important for customers to get started on machine learning and start doing proof of concepts, while the tools that AWS has made it much easier,” Bratin Saha, vice president for AWS ML, told The New Stack. “So, I think it’s really important that customers understand that machine learning is the here and now it’s no longer the future.”
Amazon Web Services (AWS) is a sponsor of The New Stack.