Axiom: All the Observability Data without Cost Worries
After Microsoft acquired Xamarin, the team that built Xamarin Insights, the mobile app analytics and crash reporting service for .NET developers, pondered what they wanted to build next.
Neil Jagdish Patel, Seif Lotfy and Gord Allott, who also had previously worked together on Ubuntu at Canonical, decided to build what they wished they had at the time, which became Axiom. It’s a serverless platform that offers the ability to store and query unlimited amounts of data to improve the observability of system problems.
“We were dealing with billions upon billions of events a day [at Xamarin Insights]. We had a very small team that was trying to do the frontend and build a product as well as keep these databases up and running,” explained Patel.
“And we kept hitting this problem with time-series databases, where essentially we had done expensive trials with the more established companies, we had done open source work … we were patching things like Prometheus, we were doing all sorts of different things. And no matter what we did, our service would not stay up, the database wouldn’t stay up. And we’d spend a ton of time basically worrying about scale, worrying about costs, worrying about, just really, having queries respond in time, etc.”
They decided there were three core problems:
- How to quickly and efficiently ingest data from myriad different sources and make it available for querying.
- How to store it cheaply so users don’t have to worry about which data to store and how long to keep it.
- And how to query the data effectively.
“Whether the data was 10 seconds old or 10 years old, we wanted to have the ability to go and just launch the queries ad hoc and go get the information we wanted from that data,” Patel said.
Banishing Data Budgets
He said they bit off more than they could chew initially, but they did build a database that’s highly efficient for ingest and ad hoc queries and with cheap storage. Bringing it to market they’ve tried to lock into where users feel pain most, and that has been with observability.
“We interviewed a whole bunch of developers and engineering managers, and we asked them, ‘What are the issues you have when you’re using the tools you have right now?’ And they all came back with things such as data budgets, or budgets around how many events can be produced, [how many] unique events can be logged, depending on the license that the company has because there was a cost attached, how long the data could be kept, how much of it needed to be sampled,” he said.
Axiom’s goal has been to eliminate those worries about selecting the right data to store because of cost.
“We want people to feel that they can produce this data from their systems without worrying about budgets, without worrying about controls and things like that, and have the data they need to go and solve the problem that’s at hand,” Patel said.
Traditionally, he said, there are three constraints:
- Memory because you have to scale vertically or horizontally to get around that limitation for queries.
- Storage — How much do you have? How much are you paying? How fast is it?
- Compute for actually being able to ingest and process queries quickly.
Axiom split up those three things. Ingest is handled in with coordination-free containers that are “super cheap to run,” he said. It doesn’t do sampling. All the data goes into S3 or another object-store. There is no such thing as cold, archived or warehoused data. All the data is always available. And all the querying is done through serverless functions.
“It’s essentially like a hybrid serverless time-series database because we mix and match this kind of cloud native architecture to leverage what we need when we need it,” he said. “So it makes the actual query super quick, but also hyper-efficient because you’re only paying for the milliseconds it’s actually running.”
It has integrations with PagerDuty and Vercel. Data shippers such as Logstash, Elastic Beats, Honeycomb and FluentD can be used to send logs, metrics, and more directly into Axiom. The Elasticsearch Bulk API also can be used to ship logs from an array of tools and services.
This month the company plans to unveil distributed traces inside Axiom with OpenTelemetry support.
It offers a point-and-click query builder that makes it easy to explore data and build visualizations of it, though it also offers a more powerful query language, Axiom Processing Language (APL), that can provide deeper insights. APL provides the flexibility to filter, manipulate and summarize the data according to your needs.
Axiom Playground lets you try out Axiom’s data analytics platform on live data. Teams can work together with its collaboration feature.
Axiom’s free tier comes with 0.5 TB of storage and the Basic plan offers 5 TB for $99 a month.
Your Data Store or Theirs
The longer-term vision is to make ingest a forgettable line item so you’re never penalized for how much data you’re producing, he said.
Companies using observability tools tend to use more than one, Patel said, but there’s a “black hole” created when developers decide not to send a certain bit of tracing or logging because of cost concerns.
“You can learn from Axiom what you’re missing in your observability landscape, because you can blindly send data to it, and not feel like you’re gonna wake up in the morning with thousands of dollars’ worth of costs,” Patel said.
“We really want to get people into the kind of unlimited way of thinking, no sampling, no tricks up your sleeve, you just sent data, [and] it is made available for querying. And you can go and build what you need to build on top of that,” he said.
The company is totally remote. Patel and Allott live in the UK. Lotfy lives in Germany. “I joke that the headquarters is wherever I am at any point in time, basically,” Patel said, laughing.
Axiom initially was offered as self-hosted, and that is still available, though it’s also available on the cloud providers. Some users seeking greater control over their data, however, prefer a “cloud prem” model in which a SaaS vendor manages the code, and the customer brings their own storage. Axiom can work with that too, Patel said.
Atlanta-based Speedscale, a traffic replay framework for API testing in Kubernetes, has been using Axiom since late last year.
It tracks two different kinds of monitoring/telemetry data: data for its own cloud service and data collected from customer deployments.
“The telemetry from our cloud service belongs to us, and we are able to send it anywhere. We currently use Datadog. But the telemetry from the customer environments belongs to the customers, and we have chosen not to send it outside of our cloud,” said Ken Ahrens, Speedscale co-founder and CEO.
“We ingest the customer data into our own AWS account and load to Axiom that we host ourselves. This lets us build rich customer health dashboards while keeping all the data in-house.”
For about a year Speedscale loaded this data into AWS OpenSearch, which is managed by Elasticsearch.
“While it functioned OK, it frequently had performance problems and was getting very expensive to run at scale,” Ahrens said. “At one point this was over half our cloud bill, so we knew we had to change. We had gotten introduced to the Axiom team, and they claimed to have better performance, and we could run it in our own network, which was a business requirement. We migrated over fairly quickly, and then shut off the Elasticsearch clusters and had a little party.”
Speedscale uses the data from Axiom to understand how customers are using its software in their clusters.
“One big use case is to run analytics across all our customers to understand where they are on their product adoption journey — login, install, sending data, running tests, etc,” Ahrens said. “The second use case is for troubleshooting problems. The data in Axiom helps us troubleshoot specific errors and problems when we are unable to replicate these conditions on our infrastructure. We are also on Slack with the Axiom team and share notes, best practices and experiences back and forth.”