This Week in Programming: Honeycomb’s ARM Advantage
Honeycomb’s service is “differentiated by its scale and speed,” explained Liz Fong-Jones, Honeycomb principal developer advocate, during the Summit keynote.
The goal with Honeycomb’s service is to have any engineer answer any question about their malfunctioning or under functioning system within 10 seconds or less — even previously unasked questions that come through iterating a train of thought, or, to “Follow the breadcrumbs,” as explained in her keynote breakout talk.
The secret sauce? The o11y company collects all the operational data it can from the client, stores it on AWS solid state drives then uses a combination of the AWS Lambda serverless service, and speedy AWS Graviton ARM-based processors to parse the data and return the queries.
But also instrumental is the Amazon distribution of OpenTelemetry, the Cloud Native Computing Foundation‘s open source package of APIs, libraries, and agents to monitor applications through distributed traces and metrics.
Honeycomb pre-processes all the application-generated data and stores it in Amazon Simple Storage Service (S3), where it is then analyzed on the fly through the AWS Lambda serverless service. The service currently processes 2.5 million trace spans a second — up from 200,00 just three years ago. “Our customers are asking 10 times as many questions about 10 times as much data,” Fong-Jones said.
It’s a pretty impressive setup for the work of only 50 engineers. The setup consists of a combination of stateful and stateless services, built mostly of GoLang, but some Java and Node.js thrown in as well.
Honeycomb appears to be bullish on the ARM architecture.
Fong-Jones noted that when the company saw a 10% improvement in median latency, when switching to Graviton 2 from the AWS M5 Intel Xeon-based instances. “the Graviton 2 processor is just much more efficient, and we’re able to push much more load,” she said.
Moreover, A/B tests between Graviton2 and Graviton3 found a further 10% to 20% improvement in tail latency, and a 30% improvement in our throughput and median latency. And the CPU utilization is about 30% lower, “which means we can push it a lot harder,” she said.
Honeycomb saves a bit of coin by using AWS Spot instances, which are those machines not already being used within AWS. AWS has a graceful termination handler that exits out workloads when the processors are needed elsewhere. Here, Honeycomb initially saved about 20% by moving some workloads to spot.
For Kafka streaming data ingest, Honeycomb uses EC2 Im4g instances, which are based in Nitro solid state drives. Earlier, slower, storage iterations left the CPU starved for work. “Right-sizing everything onto Im4g lets us hit our network CPU and storage thresholds appropriately,” she said.
Lambda provides another piece of the puzzle. Even using 100 speedy Graviton instances alone won’t entirely get the job done, given the millions of files stored on S3. This is where Lambda comes in, able to instantly provide up to “10s of thousands of parallel workers.”
“With AWS Lambda and Graviton combined together, we see about a 40% improvement in price performance,” she said.
As someone who’s new to the whole observability space I gotta say, traces make way the hell more sense to me than all the different kinds of ways to generate and aggregate metrics.
— Phillip Carter (@_cartermp) July 14, 2022
This Week in Programming
- A Cloud ARM Race?: For about a decade now, the industry has more or less agreed that ARM64 single-threaded multicore processors in the data center would be a good thing, given their operational efficiencies. AWS has offered ARM in 2018, and with the introduction of Graviton2 in 2020, AWS indicated that the ARM architecture would be suitable not for mission-critical scale-out cloud workloads. In April, ARM came to Azure as well, with the general-purpose Dpsv5 and memory-optimized Epsv5 virtual machines. Now the last of the big three cloud providers, Google Cloud, has joined the party as well. This week, the company introduced its first ARM-based instance, the Tau T2A. Powered by Ampere Altra ARM-based processors, T2A virtual machines offer up to 48 vCPUs per VM, with 4GB of memory per vCPU and 32Gbps networking bandwidth. The Tau T2A family of VMs are suitable for scale-out workloads such as web servers, containerized microservices, data-logging processing, media transcoding, and Java applications, according to the company.
ARM is inevitable. It is the future. It’s now on every major cloud provider – Amazon, Microsoft, Google, and Oracle.
It’s in your laptop if you’re using a modern Mac.
Welcome to the future, everyone. https://t.co/tjywueZLuB
— Liz Fong-Jones (方禮真) (@lizthegrey) July 13, 2022
- Visual Studio Gets Cozy with Git: Microsoft is making its Visual Studio integrated development environment more compatible with the widely-used open source git source control management software. Visual Studio 2022 17.3 release. Until now, switching between git branches in a repository would result in wait times, as the new branch would be loaded in. “For example, every time a team member used to add/remove projects to/from their branch, the rest of the team would most likely have experienced a solution reload when switching to or from this branch,” wrote Microsoft Senior Program Manager Taysser Gherfal, in a blog post explaining the release. Microsoft figured out how to reduce the number of reloads by 80%, eliminating inefficiencies such as requiring a reload whenever a team member would load in that same branch. This release includes some performance improvements for indexing and colorizing C and C++ code as well. As an example, in an earlier version of Visual Studio, it would take 26 minutes to index the Chromium code base, while the new version can do the same task in six minutes, reports SD Times.
- Observability, the Final Frontier: In the suburbs or the city, you may not think that much of the night sky, which appears as a vastness of darkness only occasionally punctuated by a star, or one of Elon Musk’s Starlink low earth satellites. Without the surrounding urban light pollution, however, the night sky has an entirely different, and far more menacing, view. Rather, it’s a frightening turmoil of planets, stars, galaxies and various other bits of matter and energy all swirling around through space and time. Frankly, it’s a miracle our little planet has survived thus far in this maelstrom without being nixed at some point for a hyperspace bypass (as predicted by Douglas Adams in his quintessential “Hitchhiker’s Guide to the Galaxy”). This week, we got the deepest glimpse yet into this cosmos, thanks to NASA’s James Webb Space Telescope (JWST), which delivered its first batch of infrared images of the universe (or a very small bit of the universe) JWST’s lens, orbiting the sun about 1 million miles from Earth, is so powerful that actually saw back in time. The initial images it took, of galaxy cluster SMACS 0723, showed the galaxy as it was 4.6 million years ago, thanks to the limits of how fast light can travel (186,000 miles/second). In May, our European correspondent Jennifer Riggins filed a fascinating report about the engineering behind JWST.“It’s a great platform for demonstrating site reliability engineering concepts because this is reliability to the extreme,” said IBM SRE architect Robert Barron said of JWST at the WTF is SRE conference. “I think there are a lot of lessons and a lot of inspiration that we can take from this work into our day-to-day lives.”For instance, he explained, the design team initially focused on the functional requirements: They needed a bigger mirror than the Hubble Space Telescope, but NASA didn’t have the capability to send a mirror that large into space. This led them to define some non-functional requirements, such as creating the mirror from smaller hexagons which then could be unfolded in place. The work was based on NASA’s values for sending crafts into space, namely that components need to be redundant, reliable and repairable.“There’s no doubt that the James Web Space Telescope SRE strategy has more stakes than any enacted on Earth. It still makes for a fantastic example of how site reliability engineering and observability needs vary within the context of circumstances,” Riggins wrote.
I’m sorry I missed the political and civil servant history of James Webb who played a pervasive role in homophobic discrimination, helping set historical policy to remove/ban LGBTQ people from federal gov. This naming harms the amazing #JWST contributors https://t.co/jA7pRjaSgM
— Jennifer Riggins💙💛 (@jkriggins) July 15, 2022