Cloud Native Ecosystem / IoT Edge Computing / Observability

CNCF Prometheus Agent Could Be a ‘Game Changer’ for Edge

21 Dec 2021 11:57am, by

Prometheus’ creators have made significant changes to the scraping capabilities of one of the cornerstone Cloud Native Computing Foundation (CNCF)-supported projects. With this new capability, the agent mode optimizes Prometheus for remote write use cases.

The new functionality may not sound like a lot on the surface, but it’s particularly applicable to edge computing and networks for which energy consumption and resource savings are critical. This is because these applications are not necessarily structured for the traditional use of Prometheus for monitoring. In this way, the agent mode is a “game-changer” for certain deployments in the CNCF ecosystem, Bartlomiej Plotka, principal software engineer for Red Hat, and TAG/SIG observability tech lead for the CNCF, wrote in a blog post.

Agent Mode in Effect

The agent mode disables some of Prometheus’ usual features and optimizes the binary for scraping and writing to remote locations. The functionality introduces a mode that reduces the number of features while enabling new usage patterns, Plotka wrote.

How the new agent brings near-stateless monitoring to the edge by reliably writing data straight to a central time-series database for analysis is indeed a “game-changer,” Torsten Volk, an analyst at Enterprise Management Associates (EMA), told The New Stack.

“Enabling remote write in a reliable manner and without the need for significant local storage to enable querying and prevent data gaps is the ‘secret sauce’ behind this approach. Now you can have lots of intermittently online edge devices with minimal local storage and centrally monitor them within their overall application context in near real-time,” Volk said. “This follows the essence of scale-out computing, where the health of the overall herd does not rely on each individual sheep.”

The agent mode functionality optimizes Prometheus for remote write use cases. It disables querying, alerting and local storage and replaces it with a customized time-series database (TSDB) Write-Ahead-Log (WAL). “Everything else stays the same: scraping logic, service discovery and related configuration,” Plotka wrote. “It can be used as a drop-in replacement for Prometheus if you want to just forward your data to a remote Prometheus server or any other Remote-Write-compliant project.”

Image: Bartlomiej Plotka.

As mentioned above, the new agent mode functionality is especially applicable for edge clusters and networks. Sectors such as telecommunication, automotive and others that deploy and manage data from IoT devices for cloud native networks should appreciate the power and resource savings the agent mode offers.

“We see more and more much smaller clusters with a restricted amount of resources,” Plotka wrote. “This is forcing all data (including observability) to be transferred to remote, bigger counterparts as almost nothing can be stored on those remote nodes.”

Edge Computing Benefits

For edge computing, Prometheus’ traditional pull and push capabilities have posed constraints on resources, Volk said. With the use of the new agent mode, Volk said, the storage bottleneck seems mostly eliminated as agents are now able to write their “part of the picture” directly to a central time-series database in near real-time. “You can think of this as each of potentially many thousands of agents contributing its own part of the highly distributed application puzzle in near real-time and without having to worry about running out of local storage space or memory,” Volk said. “These, often scarce, resources can be used by other edge applications.”

The key aspects the customized agent TSDB WAL offers that Plotka communicated include:

  • Improved overall operational data efficiency since it removes the data immediately after successful writes. If it cannot reach the remote endpoint, it persists the data temporarily on the disk until the remote endpoint is back online. “This means that we don’t need to build chunks of data in memory — we don’t need to maintain a full index for querying purposes,” Plotka wrote. “Essentially the agent mode uses a fraction of the resources that a normal Prometheus server would use in a similar situation.”
  • Enablement of easier horizontal scalability for ingestion. Agent mode essentially moves the discovery, scraping and remote writing to a separate microservice,  allowing for a focused operational model on ingestion only, ​ Plotka described. Prometheus in agent mode is, in this way, “more or less stateless,”  Plotka wrote.