Fluentd’s Role as a Data Collector in Today’s Cloud Native World
Open source Fluentd has emerged as an open source data collector for massive amounts of log data from often many different sources — in a way that is especially useful for cloud native deployments on Kubernetes.
As one of the developers, Eduardo Silva, principal engineer at Arm and part of the Fluentd development team at Treasure Data, described how and why Fluentd’s utility is becoming even more important to keep up with the demands of scaling data in today’s cloud native world. He discussed that and other benefits the data collector offers during a podcast hosted by Alex Williams, founder and editor-in-chief of The New Stack, recorded at KubeCon + CloudNativeCon 2018 in Shanghai.
The main benefit for Fluentd is how any production environment can have access to comprehensive data analysis about applications, whether they are running on standard servers or on distributed systems with Kubernetes, Silva said. This data might include error information, warnings or general information about how an application is running. This information is provided in the form of messages, called logging, about how they are operating.
“But when you have this application running at scale, you need to have a way to perform this data analysis, and for that, you need to centralize this log, you need to filter this log, process logs and then aggregate them back in some kind of database,” Silva said. “And as a manner of process, that is a pain but Fluentd is a solution built to solve that specific problem.”
While most people believe logging consists of log files with many records of information, today’s use cases are much more than that, Silva said. “Nowadays, logging is all about data streaming application that’s continuously sending data about how they are working, how they are operating and not just the application — but about hardware,” Silva said.
A firewall, for example, usually sends logging messages about security-related information, of course — but the data that is flowing is not only just in a file, but is a stream of data, Silva said. “So, I would say that any logging solution nowadays needs to deal with data at scale, and that data comes from a stream, and a file becomes a stream,” Silva said.
In this way, Fluentd plays a major role in unifying and centralizing the data, Silva said. For a Kubernetes cluster, many different kinds of logging information and data streams are generated, for example. “Your aim is to do data analysis but you need to collect this data,” Silva said. “And all of these data come in different formats and from different ways.”
A logging solution that solves data-collection problems and offers data analysis needs to be able to correlate and be able to listen for messages that comes from different formats, Silva said. “For data analysis, solutions must aggregate and concentrate all the information,” Silva said.
Fluentd was designed as a pluggable architecture because “that was the only way to make a community that can create more value on top of the project and take advantage of it,” Silva said.
Fluentd currently has more than 800 plugins available, while Treasure Data maintains no more than 20 plugins, Silva said. The rest are maintained by the open source community, including Microsoft, Red Hat and other organizations and contributors, Silva said.
Many interesting use cases exist. For example, some organizations might use Kafka as persistency in a database for the data that Fluentd collects, Silva said. They then let Fluentd collect logs from the log files, TCP or all services that are running in the environment, and then they aggregate all the data back to Kafka, Silva said. “Kafka has persistency, and when they want to consume back those logs for some reason, for business reasons; they just connect to a specific Kafka topic that they know that that data is there,” Silva said. “So, the whole thing about Fluentd is that you [do not have to] just inject the data in one place — it can be in multiple places.”
In this Edition:
1:49: What is Fluentd?
4:13: How do you separate the data in that environment from the application architecture? How do those intersect?
8:50: How does plugin architecture management apply for the user?
12:01: What role does Kubernetes play in Fluentd’s architectures?
15:59: Why did you write Fluent Bit in the C programming language?
20:26: How will the project grow and what is its direction going forward?
Raygun sponsored this podcast, which was produced independently by The New Stack. KubeCon + CloudNativeCon 2018 is a sponsor of The New Stack.