CEO Raj Dutt Interview: The Grafana Experience Will Change

First, there was Graphite. For those who liked the metrics tool but wanted a better user interface, Grafana built its data-visualization panel to accommodate. Then Prometheus came along before Grafana added support for Prometheus in 2015. The rest is history, as Grafana is approaching the point of becoming the de facto panel for data visualization with nearly 10 million users. Grafana continues to maintain its support as the main contributor to the open source Prometheus project with more than 44% of Prometheus’ maintainers hailing from Grafana Labs.
“It was a relatively obvious decision for us to collaborate with Prometheus, even when Prometheus was a very small metric dataspace,” Raj Dutt, CEO and co-founder of Grafana Labs, told The New Stack earlier this month during ObservabilityCON 2022 held in New York. “Prometheus served as the springboard for Grafana, along with Graphite and InfluxData, setting the stage for what we call our ‘big tent philosophy’: We want to prioritize interoperability for whatever the next great logs or metrics database is, whether it’s ours or not.”
Grafana today, of course, offers more than a good UI for Prometheus. Its observability reach now extends to Mimir for metrics, Loki for logs and Tempo for traces. Grafana also this month introduced two new open source projects: Grafana Phlare for continuous profiling and Grafana Faro for frontend application observability. This follows the addition of k6 to its portfolio last year.
But what does this all mean for the developer community? We sat down with Dutt to discuss why and how Grafana Labs wants to reach developers more directly. This will include expanding its data source reach further left in the development cycle, to make testing data more accessible for observability and other features users have requested. Dutt also discussed what Grafana’s plans are now for observability and what makes Grafana different as its user base balloons past the 10-million user mark.
Mimir is now one of Grafana’s more recent open source projects, replacing Cortex as what you call Cortex’s ‘spiritual successor,’ with the vast majority of Cortex’s enterprise capabilities now available with Mimir. Still, Prometheus remains a leading tool for metrics. Could you please put this into context?
Cortex is a scaleout version of Prometheus. But Mimir is positioned as a metric database and Prometheus compatibility is a feature. It also features compatibility with Datadog, Graphite and OpenTelemetry. Mimir is a drop-in replacement for Prometheus but its scope of compatibility is very wide.
Grafana dashboards are immensely popular in the DevOps community because it just works well. But developers and operations folks really don’t seem to care that much about the corporate aspects of popular tools or platform provider but do care about what a particular vendor will do to help to improve their life and work. To that end, what is Grafana Labs trying to do at this time?
We’re trying to build an observability platform that is both composable and complete. And you’re right: Grafana is just the front end so over the last three years, we have released open source projects for metrics (Mimir), logs (Loki) and traces (Tempo) And this is really helping us to come up with a complete and curated implementation of an open source stack that we think is a best-of-breed toolset.
People don’t have to use the whole thing. They can just use Grafana with Mimir or Loki or not. You have the choice, it’s composable to build it however you want. We released new open source projects when our customers push us in that direction. They want to have a stack that is an opinionated, fully supported experience and guarantee a full end-to-end experience with our stamp on it that is integrated.
When people piece their own stack together, that gives them a lot of flexibility. But it can be difficult to do. It takes a lot of engineers and expertise. And so that’s fine for a segment of our customers. They want DIY. But we release open source projects because they help us complete our vision of offering an entire observability platform, [as opposed to just a dashboard to access metrics, logs and traces].
I know you don’t want to talk about company stuff, but at this point, less than 50% of our revenue is from Grafana frontend.
The proportion of Grafana customers is only a very small percentage of Grafana users. How does Grafana generate revenue?
That’s a great question. We want to grow the pie of users. We now have almost a million companies using Grafana. We want that to be 10 million. We then monetize by capturing a small piece of this really large pie. So, we have 2,000 paying customers. If you’re a large enterprise and have sophisticated compliance and security requirements — among the world’s 10 largest world banks, half are customers of ours — we support their compliance, security and other requirements and we give them capabilities that are not available in our open source version.
Your cloud offerings are relatively new. How has the reception been?
We also have cloud offerings in addition to our enterprise products. The cloud offerings are proliferating, and they are obviously a big potential, already representing more than 50% of revenues.
When a large bank or another large customer opts to integrate Grafana’s enterprise version into its IT infrastructure, how does that work? How much support does Grafana offer for the implementation and support, for example?
The idea is to build software that is easy to use. A very small percentage of our revenue comes from professional services. The overwhelming majority of our revenue is for pay-for-cloud consumption or how many user seats you want to add to the enterprise version. We sell cloud services and software licenses. We do very little consulting services, which represent a single-digit percentage of our revenue.
We will provide a safety net. But, we will pick up the phone and answer emails or Slack messages for questions our customers have. If they are in need, we will respond. If there is a security issue, we obviously want to know about it and will fix it. Support is a reason people pay us, but it is only the second or third reason.
I would intuitively guess that using Grafana open source projects and Grafana enterprise software, all access through a Grafana dashboard would not be very popular if you had to pay a lot of money to implement and use. As a user, it is the simplicity of a Grafana dashboard and access to the different data source and observability options is the reason I use it. I’m also assuming that the users appreciate the nice-to-look-at graphics of the Grafana panel.
I don’t think relying on revenues from providing support is a good way to build a business. We want to lower the developer friction. We want to make it easy for developers to start playing around with it and to trust the technology and then we’ll have a conversation about what else can you do. Sometimes I tell my engineers that your job is to put support out of business. Ideally, the software is supposed to be so easy to use that you would never need support.
We want to reduce the learning curve as the overall complexity of software increases.
Our customers are not in the business of running an observability stack. They are in the business of serving their own customers of whatever business they are in. They typically have the engineering talent and a really smart team that knows all they need to know to develop and deploy a great software stack and to use their choice of observability tools to help with that. Those are exactly the kinds of customers we have.
How are you trying to solve perceived needs and how does open source factor in?
People wouldn’t adopt our software, and the community wouldn’t be growing unless we’re solving real problems.
And open source code speaks for itself. You don’t have the ability like you do in the commercial world to market your way through to initial adoption. It’s tougher, it’s real. It’s self-evident to everyone. I think software developers in general are pretty immune to marketing. They will say “I don’t want to be told why the software is good,” they want to try it out and if it’s good, they will be the one to tell you that.
Speaking to the community, what’s next? You want to expand Grafana’s users obviously but what are you really trying to offer the developer community?
We have a decent story with what we call LGTM: Loki, Grafana, Tempo and Mimir. We have spent the last four years developing that initial core stack. So what’s next for us is we’re going earlier in the software development cycle, with more testing, including chaos testing, and reliability engineering with tools like k6. We are going earlier in the software lifecycle with the pre-prod testing and continue to invest in observability.
We are also going earlier and also later in the software development lifecycle, to support observability alerts. This involves support for who receives alerts, who’s going to get them, and who’s going to be woken up on call. For example, once that person is working on an issue, how do you resolve the issue and how do you communicate that to your customers? How do you internally manage the incident, determine the reason for an outage and communicate that to your customers? What’s the impact on SLAs and SLOs. Everything that has happened once an incident happens is another area we are going deeper into, expanding on tools we already have for this, including Grafana Alerting, Grafana OnCall and Grafana Incident.
We’re getting stronger in our core observability offerings: Grafana panels, logs, metrics and tracing and we have just launched continuous profiling (Grafana Phlare). So we’re going both earlier and later in the application lifecycle.
How would you break down the user base for Grafana between say developers versus operations folks?
That’s a good question. If you look at the user base of people using our software, there are about 10 million people using our software today. So, if you look at what they’re using it for, observability is the core use case — and it’s about 75% of what Grafana is used for. So, the other 25% is for use cases such as business intelligence, sensor data, the internet of things process control, or power-plant data. SpaceX uses Grafana to manage fuel flow. So, 75% of the community is using Grafana for observability and the rest use it for other use cases.
Tell us about the fun stuff.
Yes, the fun stuff. So as a company we are focused on observability use cases but we are also interested in seeing what’s going on with the fun stuff.
Secondly, who is using Grafana for observability among that 75% base? It’s a combination of ops people, developers and SREs. If there is one persona that stands out, it is probably becoming more and more of the SRE persona, arguably a hybrid between developers and operators anyway. So then when you look at the user base, it started with your tech-forward startups or companies willing to take a risk, smaller companies, hobbyist users: that’s how the community started. But if you look at it today, during the last five years, the largest companies in the world have started to put our software into production. There is a lot of trust now with the brand, quality and capabilities. When Grafana 1.0 came out, there is no way half of the largest banks in the world would have put it into production.
It’s about developer friction. I’m sure you’ve seen during the past 10 years how selling software is no longer about taking the CIO out to the golf course. So that was a fundamental observation that we had when we started the company and it goes back to developer friction. We are optimizing for impressing the developer and the practitioner because that person has much more influence today in their organization on what to buy than they did 10 years ago. 10 years ago they were told you are going to use this. We just signed a contract with IBM today and have a 10-year contract. But now if that happens the developers will just quit.
There are 10 million eyes on the Grafana dashboard. My take is that for observability, such as monitoring server performance or a SaaS environment, I will have the responsibility to bring in whatever combination of platforms I want to use, along with OpenTelemetry, that may or may not include Loki, Grafana, Tempo and Mimir. Would you agree with that assessment?
Every observability vendor will say “yeah, we support interoperability but you have to send all of your data to us.” They will all say we’re going to solve your problems but your data has to be sent to us and stored inside of our platform. Their billing model and the way they make money is only by the data you send to them. So, if you don’t send the data to them they won’t act on it, whereas we have a user model where we sit on top and you pay us for the number of users you have and you can do whatever you want with your data. This is a really important angle for what makes us different.
In the BI space you hear about SQL databases, Snowflake, Databricks or offerings like that. So, if you were selling a SaaS solution and you would walk into a company and say “I’ll help you visualize your data but you have to store all of your data with us, the BI vendor,” you will get laughed out of the room. But in the observability space that doesn’t happen because any observability vendor will come in and say “we can solve your problems but you just have to keep the data with us.” With Grafana you use any combination of data sources you would like. And the data, of course, remains where you would like it to be. And of course, in the sense that we are competing, Loki, Mimir and Tempo actually have to compete with the other data sources.
Our story of composability is big-tent: you can make your own choices for what data sources you want to use. That is a big differentiating factor for us.