Fluentd Offers Comprehensive Log Collection for Microservices and Cloud Monitoring
For those who need to collect logs from a wide range of different data sources and backends — from access and system logs to app and database logs — the open source Fluentd software is becoming an increasingly popular choice.
This framework, created from Treasure Data, is a log collector with similar functions to Elastic’s Logstash, explained Stephen O’Grady of analysts RedMonk. “Fluentd is adept at collecting large volumes of semi- or unstructured data and directing them according to routing rules to other storage backends such as Elasticsearch or PostgreSQL. It’s well regarded by a variety of cloud providers, including Amazon and Google,” he said.
In fact, Google Cloud Platform‘s BigQuery recommends Fluentd as a default real-time data ingestion tool (not least because it lets you log data from AWS into BigQuery). It’s also natively supported in Docker and is being used by Microsoft as the agent for analytics and log monitoring on Linux for the new Microsoft Operations Management Suite (OMS). Why is it proving so popular?
Input, Output, Routing
Like the Unix syslogd utility, Fluentd is a daemon that listens for and routes messages. You can use it as a collector or an aggregator, depending on your logging infrastructure. And you can filter the logs coming from a variety of sources, and send it to a huge range of outputs via plugins — there are over 300 plugins so far). Treasure Data’s Kiyoto Tamura suggests it’s what you’d get “If syslogd evolved a little more and was a little more modern and easy to hack on. We did [Fluentd] because we couldn’t’ get syslogd to do what we wanted it to do.”
Microsoft chose Fluentd for OMS partly because it was what people in the Linux community were already using, but Microsoft’s Anurag Gupta praises the modularity of the input and plugin model, as well as the wide support.
MySQL monitoring is one of the most popular uses, but there are also plug-ins for Kafka, Twitter, Kubernetes, Twilio, as well as for Short Message Service (SMS) notifications, and Simple Network Management Protocol (SNMP) data.
“There’s just this wide variety — and it’s pretty trivial to go out and create one of these things,” Gupta said. “As an IT guy I don’t have time to build a complex thing in native code but I don’t mind writing a couple of lines of scripting so I can use an existing plugin. The flexibility of Fluentd lends itself to a lot of scenarios.”
That was one of the goals of Fluentd, confirmed Tamura. “We started with the idea that the inputs and outputs should be configurable.” You do that by selecting source and output plugins, and giving them parameters. “We also strongly believed that routing should be included, for people coming from an ops background but that it should also be able to handle pretty complex logic, and that’s the idea behind tag-based routing.”
Every event that comes from a source has a tag, which the routing engine uses to direct the event, a time stamp, and a record that’s a JSON object. Match commands in the configuration file tell Fluentd which output plugin to route events with specific tags, or you can use filter commands to set up routing pipelines that process the events before they’re sent to the output plugin.
More than half of the Fluentd plugins are for output, Tamura said. “Inputs are HTTP, files, TCP, UDP, but output is a big differentiator against many other tools. The most popular output is Tableau, the next is Google spreadsheets, we’re working with a company that’s an SQL Server shop. Fluentd can serve as the connective tissue that connects all these multiple platforms,” Tamura said.
Fluentd was able to achieve 2,000 messages per second over TCP with no problems, with one agent on a one-core system with a 1Gb network card
Matches and filters can be sophisticated, Gupta pointed out. “There’s a whole host of things you can do. You can convert the JSON to XML or to an encrypted stream that only the output can recognize. With OMS we enhance some of the data using Fluentd; we take the raw SQL logs and add the computer name and we can tokenize that into a specific field. We have audited data in multiple fields; we have hundreds of thousands of events taking multiline events, and with filters, we can tell you specifically what event is applicable.”
But the overall model remains simple. “The modularity is huge,” Gupta told us. “It helps developers wrap their head around how to build with Fluentd; I need to build a source or an output to a certain endpoint or I need to filter the data. That trifecta of source, filter and output is great for us as we build out more monitoring and functionality and for any developer using Fluentd it gives them a lot of freedom. They’re getting all these sources and filters and transformations, coming in and branching out to external services; that could be OMS or could be another external log analytics services or it could be API endpoints.”
“All these API endpoints just require some data source; you can use Fluentd as the middleman,” Gupta continued “It’s useful anywhere that you need to stream some data, perform some calculation on it, send it to an endpoint and have all that correlated in a central repo.”
Fluentd is particularly well suited to microservices and containers, where logging is a more complex problem than with a monolithic, n-tier service because it can be centralized. In fact, this was one of the original inspirations, said Tamura.
“We built It essentially because increasingly, the stack is very modularized.” But that’s not the only way you can use it, Gupta confirmed.
“Fluentd is very applicable to a per-node architecture where I have a very specific server running my relational database and I need to stream logs from that just one machine to a central place without inflicting any pain to the workload. Or if I have 100,000 containers and I need a central spot to take all the logs from stdout and stderror I can use the Docker Fluentd driver bring that to a single node set or maybe a cluster and fire that off to a central location. You can have container-based logging across the whole container host.”
Buffer and Queue
Making routing and processing efficient across large systems with high volumes of events was key, Tamura explained. “One of the things we really wanted to do well is be performant but also be reliable without relying on an external queue or buffering. That was the biggest difference early on between us and Logstash; that has a simpler queuing model but it relied on Redis for consistent queuing. We try to do it in our own internal buffer (and you can buffer in memory or in file).”
Fluentd is written in Ruby — “we took a cue from Chef and Puppet.” Tamura said, “and that means it’s hackable by a fairly large number of people.”
That makes it simple to deploy, which helped Microsoft pick it over the also-popular Logstash, said Gupta, but the mix of performance and reliability the queuing provides is also key. “Logstash is using JRuby so you need to spin up a JVM; it’s not as lightweight as Fluentd.”
“One of the big enterprise concerns is that you want to make sure messaging is reliable and one of the big things that Fluentd has that Logstash doesn’t have natively, out of the box, is that buffering mechanism to make sure messages were sent over TCP and validate that the message has transmitted,” Gupta said. “For Logstash you have to set up the Redis Cache monitor and make sure it’s set up correctly.” The licence is also simpler, which matters for enterprise customers. “I can bundle Ruby with Fluentd for a customer who just wants OMS; they don’t have to care about the details, they just know I have monitoring.”
In Microsoft’s performance testing, Gupta told us, “Fluentd was able to achieve 2,000 messages per second over TCP with no problems, with one agent on a one-core system with a 1Gb network card.”
Analysis as Well as Ops
While the obvious comparison is to Logstash, especially as part of the common Elasticsearch-Logstash-Kibana (ELK) stack, and monitoring systems like Prometheus, Tamura suggested that “the big competition is Splunk.”
Logging is more important than ever, not just for overloaded ops teams, but because it’s increasingly a source for analysis.
“Logs are increasingly used beyond the first use cases of incident analysis and ad hoc root cause analytics. Now it’s a source of insight and innovation. Often it’s ops and DevOps people with access to logs but their primary responsibility is not analyzing the data; their primary responsibility is to keep the lights on. The data science people told us they want more data but when they go to ops for it, it’s an unwieldy process, and some developers even want to remove logging code to make the system more efficient. The motivation of Fluentd was to remove that friction.”
Docker is a sponsor of The New Stack.
Feature image via Pixabay.