Monitoring

SoundCloud’s Prometheus Microservice Monitor Celebrates a Year of Open Source Success

26 Jan 2016 10:43am, by

When SoundCloud migrated its systems to a microservices architecture, it found that its then-current set of monitoring tools, largely based on StatsD and Graphite, couldn’t adequately watch over the hundreds of services and thousands of processes pressed into production.

And so SoundCloud built its own monitoring application and thus Prometheus was born.

One year ago today, SoundCloud released the Prometheus codebase as open source. “This is when things started going slightly crazy in a good way. We saw a sharp rise in contributors, mailing list questions, GitHub issues, IRC visitors, requests for conference and meetup talks, and increasing buzz on the net in general,” Prometheus developer Julius Volz wrote in a blog post celebrating the anniversary.

Today, Prometheus has grown into a mature open source project, with a thriving support ecosystem. Volz shared some numbers about the software’s success:

  • 200+ contributors
  • 2300+ pull requests (60+ open)
  • 1100+ issues (300+ open)
  • 150+ people in our IRC channel (#prometheus on FreeNode)
  • 250+ people on the mailing list who have created 300+ threads
  • 20+ Prometheus-related talks and workshops
  • 100+ articles and blog posts

prometheus_github_stars

 

Since going open source, contributors have added service discovery mechanisms for Kubernetes, Marathon and AWS’s Elastic Cloud Compute (EC2) service. Google is now using Prometheus to instrument Kubernetes, and CoreOS is doing the same to monitor Etcd. DigitalOcean, Docker, The Financial Times, and KPMG are all using the software as well.

Graphical dashboard Grafana now offers visual support for Prometheus. And at least one company, Robust Perception, provides commercial support and consulting services around the software.

Open-source monitoring was in a rather sad state when SoundCloud initially developed Prometheus, an effort that started three years ago. Traditional monitoring solutions could not keep up with the dynamic nature of microservices running on distributed systems.

Externally hosted commercial solutions were seen as a better choice, despite their rather exorbitant price tags. Rabenstein notes that many companies use external solutions because their options for self-hosting were very limited, rather than because an external solution was the right one for them.

Prometheus works as both a monitoring and alerting system, with its time-series database (TSDB) specialized for its specific use-case. As time has gone on, the number of time series databases available in open source and via private enterprise solutions has continued to rise. But when Prometheus was developed, options were limited.

PromDash Events Test Dashboard at SoundCloud

PromDash Events Test Dashboard at SoundCloud

Time-oriented data stores can be difficult to work with depending on how companies use and access their stored information and respond to user queries. Prometheus sets itself apart from more traditional approaches such as Graphite by using a multi-dimensional data model.

This data model allows for greater flexibility as opposed to a traditional hierarchical data structure. Many databases are using flexible query language rather than limiting themselves to strict SQL requirements, and Prometheus is no exception to this.

“Time series collection happens via a pull model over HTTP. In contrast to common belief, this is easier to scale and matches the reality of a micro-service architecture most more closely than a push model,” said Björn Rabenstein, engineer at SoundCloud.

Prometheus Dashboard PromDash at work during DreamHack

Prometheus Dashboard PromDash at work during DreamHack

Working with time series databases can be full of challenges and pitfalls. Understanding the data model available in Prometheus and the language to leverage it is crucial to using it as it was intended. Another crucial factor in getting the best out of Prometheus is taking the time to learn the ins and outs of metric values. Rabenstein offers a reference on Logs and Metrics by Prometheus developer Brian Brazil as a solid jumping-off point for those learning how to use the technology.

The Prometheus core components are written in Go, featuring static binaries and a lack of external dependencies. “A single Prometheus server can ingest hundreds of thousands of samples per second, belonging to millions of time series, and saves them in a highly optimized way on local disk,” said Rabenstein.

Monitoring one’s software with Prometheus can be accomplished by using one of the readily available client libraries with one’s code. If that option is unavailable, developers can use a variety of third-party integrations to export their data from where it is housed into Prometheus. Rabenstein notes that some are luckier than others, having software which already instruments Google’s cAdvisor, CoreOS’s Etcd, or parts of Kubernetes.

Prometheus Bazooka Cluster Graph

Prometheus Bazooka Cluster Graph

Traditional monitoring has shifted into quickly being seen as legacy technology, with TSDB technology paving the way for a new wave of data storage, analysis, and alerting.

“Both personally and technically, we are really excited about what has happened last year in Prometheus-land,” Volz wrote. “We love the opportunity to provide the world with a powerful new approach to monitoring, especially one that is much better suited towards modern cloud- and container-based infrastructures than traditional solutions.”

The New Stack Managing Editor Joab Jackson contributed to this story.

Feature image via Pixabay

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.