How Microsoft Contributes to the Open Source Fluentd Project
Microsoft’s contributions to open source projects keep increasing, and it’s already gone far beyond Microsoft open sourcing its technologies. Having picked the Fluentd log collection framework for its Operations Management Suite, the OMS team has been adding features that it needs, and submitting those back to the Fluentd project.
“We want to have a strong two-way streak where we’re benefiting from Fluentd, and we want to make sure that all the benefits we get from it benefit the community as well,” Microsoft’s Anurag Gupta told The New Stack.
The biggest contribution Microsoft has made so far is what he calls a ‘circular’ buffer. The buffer built into Fluentd is a key part of what makes it reliable without needing an external cache, but if you’re logging a lot of data and for some reason, Fluentd can’t pass that on to its final destination (like a network problem) that’s going to fill up.
The circular buffer will automatically drop the older data to make room for new information and keep doing that until data routed to the output starts being accepted again. “We can drop data in a rolling way, so it doesn’t just fill up and start spamming the log message file with errors, and that makes sure you get the new data that’s most relevant,” he said.
You can even set different buffer capacities for different tasks; you might want to keep more data for your security logs so you can go back and analyze them, but you might not need as much performance data (especially around the time of an outage when it might not be representative).
“You can say that you want to keep 80Mb of every security log, but set a 20Mb rotating data buffer for your performance data. That way, if you have a service disruption, you can make sure you’re keeping the security log audit data, but the performance data can go if necessary. That’s ephemeral; it’s useful to view at the moment, but it’s not something you need to have guaranteed to be there that’s going to be audited,” explained Gupta.
Another Microsoft contribution to Fluentd adds a ’heartbeat’ to its native monitoring mechanism. “You can view information about what Fluentd plugins are running on the instance, about the amount of data going through them, what configuration is. We added some additional pieces on top to surface some of that information back up as a heartbeat, so the agent knows what the state of the system is,” he told us.
It’s very common for logging agents to need a heartbeat so that you know when an agent has failed, and you need to restart it, or when you need to start routing output to a different logging host because the default one isn’t available. That would be just as useful for monitoring a cluster or a set of orchestrated microservices, so the OMS team turned it into a pull request.
“We needed to make sure we had a heartbeat for the Fluentd agent that’s reporting to OMS but instead of creating a proprietary OMS plugin, we modified Fluentd’s native monitoring capabilities to add the heartbeat,” said Gupta.
The next contribution the team makes to Fluentd is likely to be their Statsd metrics server. The code for this aggregator is already available from the OMS GitHub repo, but they’re thinking about packaging it up as a Fluentd plugin, so it’s easier to pick up.
And the OMS repo is also an alternative way to get hold of Fluentd in the first place. If you don’t want to install Ruby and fetch the Fluentd Ruby gem, you can use td-agent, a distribution package from Treasure Data who created the Fluentd framework. That does a little more of the work for you, retrieving the Fluentd package from the repo and installing it using the RedHat or Debian package manager or grabbing the OS X version depending on what system you install it on; it preconfigures some settings, including sending data to Treasure Data. There are also Chef recipes and Puppet modules that will install td-agent for you to get the process started.
But for OMS, Microsoft wanted to be able to distribute the Fluentd agent as a self-extracting shell script that you can run — and you don’t need to be using OMS to take advantage of that. That’s a slightly easier way of installing Fluentd than using td-agent in some cases, said Gupta. “We understand if you’re running Ubuntu or RedHat, we’ll link the correct OpenSSL version, and we clearly say which Fluentd version it is.”
If that’s useful, you can get it directly from Microsoft’s repo. “Anything we’re building for Fluentd is all available as open source in our GitHub repo, and it’s consumable for everyone,” said Gupta. “You can fork it under the Apache 2 license; you can do whatever you want with it.”
Feature image via Pixabay.