TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Cloud Native Ecosystem / Observability

Real Talk: Why Is Datadog So Expensive?

Cloud native architectures are churning out more data, increasing the cost of observability. But there are better ways to manage these expenses.
Dec 7th, 2023 6:00am by
Featued image for: Real Talk: Why Is Datadog So Expensive?
Featured image by Josh Appel on Unsplash.

I’ve seen so many X (formerly known as Twitter), Reddit and HackerNews threads lately discussing the high costs of Datadog. It’s such a hot topic that engineers are posting blogs about their approaches to brute-force drop metrics.

But how did we get here? Why are these costs so high? Why are companies paying more for their observability than their production infrastructure? There is a lot of finger-pointing and claims of lock-in and corporate greed, which are certainly partly to blame.

There is a bigger underlying issue: the fundamental architecture changes that come with adopting containerized infrastructure and microservices applications. If we don’t understand and address this issue, history will repeat itself.

Disclosure: I Work for a Datadog Competitor

OK, it’s true, I work for Chronosphere, a company that competes with Datadog. I promise this article will not pitch you on our product. Datadog is a strong competitor, and I’ve watched it build an amazing business for years.

My previous company was a close Datadog partner from 2015–2018, and we watched its meteoric growth, which we desperately wanted to emulate. At the same time, I watched Datadog customers get more and more disgruntled with skyrocketing and unpredictable costs, yet they felt they couldn’t leave.

This was part of what drove me to join Chronosphere in 2021, as I saw this trend coming to a head. Before I joined this space, I did some market sizing and analysis and determined that observability had the biggest attachment to infrastructure spend: For every $1 you spend on public cloud, you’re likely spending $0.25–$0.35 on observability. This struck me as a market ripe for disruption.

The Real Culprit: Data Growth

The root cause of the problem is simple: There is a lot more observability data (metrics, logs, traces and events) than these tools ever predicted. As such, they are not architectured for this data volume nor priced accordingly. There are multiple reasons we ended up with so much data.

Business drivers:

  1. Digital transformation: The infusion of technology into more business sectors naturally comes with more data to oversee system health and ensure smooth overall system operations.
  2. Higher customer expectations with greater stakes: According to the 2023 Online Reliability Report, on average, Americans tolerate fewer than four instances of unreliability or outage on an app or website before switching to a competitor. Operating high-performing and highly available services that deliver an exceptional customer experience requires more granular observability data.
  3. Data hoarding: It can be tough to know what data is useful when you’re getting so much of it on a minute-by-minute basis. Without the right tools to parse it, you can get into this trap of “I never know when I’m going to need this data” and hang onto much more data than necessary.

Technical drivers:

  1. More telemetry data generated by containers and microservices: Cloud native environments (i.e., containers and microservices) have significant advantages but naturally produce more data because you need to monitor each individual component’s and service’s health. For example, each container and microservice now emits as much observability data as each virtual machine (VM) and monolithic app used to. But now, instead of dozens of VMs and a handful of apps, you have thousands of containers and dozens of microservices.
  2. Scale of some cloud native environments: By design, cloud native is decentralized — and engineering teams can quickly spin up components — which means an exponentially growing number of services and containers generating data.

This data growth causes observability spending to skyrocket. Without changing pricing models or software to account for data growth — and keeping pricing based on legacy monitoring standards — cloud native architectures suddenly became shockingly expensive to run.

Why Can’t Datadog Just Lower Its Prices?

I suspect there are two reasons for this:

  1. Shareholder value: Datadog’s stock has performed phenomenally over the last several years. If it lowered prices, it would immediately impact revenue, which would impact the earnings reported, which would drop the stock price.
  2. Cost of goods sold: Datadog has gone through three architecture generations, with its latest, Husky, just released in 2022. This re-architecture was primarily focused on efficiency, yet didn’t reduce prices, so I assume it contributed to reducing the cost of goods sold (COGS) and getting margins to a healthy place. Since Datadog probably won’t invest in another re-architecture very soon, it won’t be compromising its margins by lowering prices.

Alternatives to Datadog

There are a couple of options if you don’t want to pay for Datadog.

#1: DIY Open Source

One attractive alternative is running your own observability in-house with open source tools. The good news is that, at least for metrics and traces, open source tools have come a long way and are coalescing into industry-accepted standards. Prometheus and OpenTelemetry with a variety of time series database backends (Mimir, Thanos or M3) are viable alternatives to Datadog.

But it’s important to note this typically won’t save you money in real dollars. It’s simply trading CapEx for OpEx. The human and infrastructure cost of running these systems is non-trivial, and if you try to cut corners, you might regret it.

I was talking to a friend recently who moved his company off an expensive commercial SaaS offering to in-house open source tools. He admitted that the company isn’t actually saving any money when it accounts for the fact that around 8% of his developer headcount is now dedicated to running this system.

#2: Next-Generation Observability Tooling

This is not the part where I pitch you my company’s product. This is where I’ll say tools are being built with the underlying assumption of data growth from the start. The cost of the solution is always in the hands of the customer, so you don’t get surprise overages.

Just as Datadog, New Relic and similar tools displaced the previous generation of Solarwinds and BMC and CA Technologies, this new generation of observability tooling is starting to make waves. Talk with these vendors and understand how they are handling the problem of too much observability data from the source versus bandaging over it with better unit economics.

Conclusion

Datadog’s high bills and vendor lock-in have somehow become a necessary evil; you know you need observability, but you’re not sure of all the options. Datadog has been around long enough that it seems like a viable option, despite its billing practices and proprietary code. But it doesn’t have to be this way.

As more observability companies enter the space, so do options that are built to address high cardinality data growth from the beginning. Ones that give you more flexibility with your infrastructure, greater control of your data and more visibility into your monthly bill, and ultimately set observability teams up for a more sustainable and cost-effective operations model.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Real.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.