PromCon 2022: Why Prometheus Had to Change
Prometheus continues to remain as an essential tool for monitoring, and especially, as a key component in observability platforms for cloud native environment. As one of the Cloud Native Computing Foundation’s (CNCF) “fastest-growing” projects, the time series database is especially useful for gathering metrics for Kubernetes clusters and is typically used with Grafana dashboards as an observability tool for visualizations.
Prometheus metrics, and observability data in general, have become increasingly essential for many often over-extended and highly distributed DevOps teams in today’s pandemic context, especially those working in development environments.
Prometheus’ installations are now in the hundreds of thousands range with millions of users, Richard (RichiH) Hartmann, director of community at Grafana Labs and a CNCF Technical Advisory Group Observability chair, said during his talk “I don’t have to convince this room that Prometheus is a de facto standard in cloud native metric based monitoring.”
But as Prometheus’ maintainers celebrate its 10-year anniversary, the community’s needs for monitoring Kubernetes are evolving quickly. Users are also becoming smarter about what they want and need. PromCon EU 2022, held in Munich in November, the Prometheus annual user’s conference, served as a forum about how and why Prometheus must evolve and what Prometheus maintainers must do.
Kubernetes is certainly hard. Monitoring and observability metrics certainly can help to tame Kubernetes management. But using tools to do that and interpreting often massive amounts of data pose obvious challenges.
In Prometheus’ case, even as it gained in popularity since former Google engineers originally released and created Prometheus at SoundCloud in 2012, it was seen as being difficult to use by many (which Grafana has helped a lot with as we see below). Usability has thus been a major pain point that Prometheus users have sought to solve during its 10 years of existence among other challenges the project has faced, Hartmann told The New Stack.
“Prometheus used to have a connotation of being hard to use,” Hartmann said. “Looking at it today, it’s a lot easier to derive value quickly.”
Much of Prometheus’ is thanks to its logical and arithmetic operators that with the PromQL query functions serve large-scale data requirements. “The fundamental problem Prometheus has solved is that for the first time, it has allowed you to do math with your monitoring data in a truly flexible and scalable way,” Hartmann said. “The industry at large will never be the same because people saw what can be done.”
Prometheus is also known to accommodate scaling needs, which is obviously required for drawing inferences from metrics in Kubernetes environments. “We now support a scale which was unheard of other than with Hyperscalers,” Hartmann said. “No one anticipated us being able to handle as much data 10 years ago. It’s been leaps and bounds.”
The talks given during PromCon covered new capabilities users can take advantage of, intended to make observability more accessible and powerful during the coming years. In other words, the talks given did not just offer a list of incremental new features and capabilities for what has become the de facto monitoring tool for Kubernetes.
Let’s start with some basics: #histograms is the distribution of all your observations in specified time ranges, called them buckets, as Ganesh Vernekar begins his @PromConIO talk this week. More cool stuff to come about @PrometheusIO. @_codesome @grafana pic.twitter.com/kgKHEhuQlb
— BC Gain (@bcamerongain) November 11, 2022
Histograms in Prometheus have had their drawbacks and Grafana senior software engineer Ganesh Vernekar’s on histograms represent one of the more important conference talks. Histograms in Prometheus have worked reliably for years, but they have had a few downsides when it came to storage efficiency, the accuracy of histogram queries and flexibility in using histograms, Vernekar described. The good news that Vernekar announced during his talk is that Prometheus v2.40.0 now supports native histograms, which represents a fairly significant development.
The Prometheus project introduced its Conformance Program last year as a way to help ensure interoperability among data sources touted as being compatible with Prometheus. The program is intended to ensure interoperability, to protect users from surprises and to enable more parallel innovation, Hartmann communicated last year.
During the past year, the testing process and other support the conformance program offers has matured, making it much easier for organizations to ensure they have done the proper due diligence. “At the core, it is really, really easy to say you’re compatible. It’s a lot harder to actually be compatible, in particular, when it comes down to the nitty-gritty details of how a specific function behaves,” Hartmann said. “It’s hard for people to figure this out on their own.”
In a nuanced way, Hartmann described how claims of compatibility that are untested can cause “confusion,” sometimes done “deliberately, sometimes, not deliberately,” which is “why we felt forced to go down this path,” Hartmann said. As a countermeasure, Hartmann described how the conformance project offers a set of tests that can be run to test a Prometheus implementation in a cloud environment.
Elephant in the Room
The elephant in the room during the conference was Grafana. Grafana, of course, has played and continues to play, a key role in Prometheus’ development. Grafana added support for Prometheus in 2015 by building its data-visualization panel to accommodate Prometheus users. Today, it is hard to find a Prometheus user that does not use Grafana. Grafana continues to maintain its support as the main contributor to the open source Prometheus project with more than 44% of Prometheus’ maintainers hailing from Grafana Labs.
Grafana has also created a number of its own open source projects for data visualizations, including the recently released Grafana Phlare for continuous profiling and Grafana Faro for frontend application observability. These open source projects build on Grafana existing open source projects Mimir (metrics), Loki (logs) and Tempo (traces).
“Grafana and Prometheus have co-evolved. Grafana the software adapted more to Prometheus than vice versa,” Hartmann said. “But a substantial amount of work on Prometheus is being paid for by Grafana Labs. Almost symbiotic, though not everyone will like that word.”