Grafana has become a popular cloud native observability dashboard for logging, metrics and traceability. Many Grafana users also learn how to run Grafana as a project or as an at-home visualization tool — something this writer recommends — as a way to become acquainted with both Grafana and Prometheus for time series data monitoring, as well as with other tools such as Telegraf and InfluxDB.
However, adoption and implementation on an organization-wide scale can be more difficult — especially for resource-strapped organizations. To help speed its use as a way to query and access information from a wide variety of data sources, Grafana Labs invested $50 million in venture capital funding to widen the reach of its visualization tool, according to the company.
To that end, Grafana recently expanded Grafana 7.0 with the release of 7.2 to include more data-modification options, for visualizations including time ranges, graphs, and other options. A commercial edition, Grafana Metrics Enterprise (GME), has also been released.
Grafana said it also seeks to remove many of the hurdles in the adoption of Prometheus with the use of its dashboard. These improvements include better security, reliability and other benefits for Prometheus with the Grafana dashboard.
“Prometheus … has exploded in popularity, but there are well-documented challenges to adopting the technology at the enterprise level,” said Tom Wilkie, vice president of product at Grafana Labs, who is also a Prometheus maintainer and Cortex co-creator.
Specific problems Grafana helps to solve include how Prometheus’s single-process model “requires you to functionally shard your deployment to handle growth — this adds management overhead and complexity,” Wilkie said. “GME automatically shards data to be truly horizontally scalable.”
Another Grafana issue GME helps to solve is how Prometheus’s high availability model relies on pairs of Prometheus servers scraping the same targets, Wilkie said.
“When a server fails, or needs to be restarted to apply updates, there are gaps in your graphs,” Wilkie said. “GME uses Dynamo-style replication to achieve true high availability, with no gaps in your graph when nodes fail.”
Prometheus also has no access control or multitenancy features built into it, so that these have to be layered on using custom software or achieved by giving different teams different Prometheus instances,” Wilkie said. “GME has both access control and multitenancy, allowing multiple teams to securely share the same cluster with full isolation,” Wilkie said.
Finally, GME is built on Cloud Native Computing Foundation’s Cortex project, “which was notorious for being hard to get started with,” Wilkie said. “GME adds an easy-to-use UI embedded in Grafana for managing your GME cluster, and comes as a single binary, single process deployment with no dependencies,” Wilkie said.
The development efforts behind GME were also intended to help make Grafana less expensive to run, more secure, and ultimately, more robust. By centralizing the management of their monitoring stack, different teams within the same organization can securely share a single, scalable GME cluster, Wilkie said.
With “fewer things to manage,” Wilkie said GME offers a push-based agent architecture brings the benefits of Prometheus (data model, PromQL, performance and integrations) to organizations where the Prometheus pull-based model is hard to implement due to network topology, security rules or compliance controls.
Simplification and automation of the scaling and long-term storage of Prometheus metrics. “GME is built on the CNCF’s Cortex project and uses the same proven technology that powers Grafana Cloud’s Prometheus service, so you can trust it to scale to the needs of even the largest organizations,” Wilkie said.
The Cloud Native Computing Foundation (CNCF) and InfluxData are sponsors of The New Stack.
Feature image by Couleur from Pixabay.