In the “Blues Brothers” movie, the proprietress of a roadside honky-tonk touts that her establishment has both kinds of music: country and western. In the crowded market for infrastructure monitoring and logging, Brooklyn, N.Y.-based Sematext hangs its hat on its ability to do both monitoring and log management — and search analytics as well.
“We monitor metrics, infrastructure, application performance … for Java applications. We can capture which components are talking to which others. We also provide log management so you can ship the logs to us, we’ll index them, make them stretchable for you, then you can correlate those logs with various other metrics and events,” said Sematext founder Otis Gospodnetić, in an interview with The New Stack. “Nobody else really does that.”
Splunk does log search, New Relic does metrics, and though Datadog recently added application performance management, none really do both, Gospodnetić claimed. As a result, most companies use multiple tools to monitor and manage their infrastructure, he noted.
Gospodnetić notes that these are all open source tools that companies can cobble together themselves if they want to, or skip all that work by having Sematext do it. Sematext provides a single user interface, so you can see metrics and logs at the same time.
Sematext agents collect information from myriad technologies in a lot of different formats, making it a one-stop shop from collection to analysis. Sematext integrates with more than 50 other technologies including Elasticsearch, Spark, Storm, Kafka, Cassandra, HBase, Hadoop, CoreOS, Nginx, Redis, MySQL, AWS Elastic Cloud Compute and more.
A Docker partner for monitoring and logging, Sematext offers a Docker agent that collects information that can be used for Kubernetes monitoring as well. “We containerize Docker agent itself. You can deploy it very easily like any other container. We have a scenario with Docker Swarm where you use one command and can deploy to all Docker Swarm modes,” Gospodnetić said.
It also served as the monitoring tool for Swarm3K, a collaborative project originally sought to create 3,000 nodes, but ultimately formed a working, geographically distributed 4,700-node Docker Swarm cluster.
It uses a single agent for Docker and Kubernetes monitoring. A single agent runs on a host, and it will monitor all the containers on that host. It will monitor any new containers that come up.
“The agent has awareness of data that it collects, so if it’s collecting Kubernetes data, it says, ‘Oh, I know how those logs are structured.’ It extracts it and parses it out of incoming data and ships it to us in a structured format that lets us build reports that are immediately useful to the user. If we didn’t do that, we’d have to ship just raw data. The user would say, ‘Oh, I’ve got these logs. What does this thing in the log mean? And how do I structure it so I can build the report?’” he said.
Sematext DevOps evangelist Stefan Theis wrote in a January blog post about monitoring difficulties that required manual updates when the number of nodes changed in the swarm cluster. Docker v1.12, however, he wrote in September, allows complete automation for the setup of Docker Swarm monitoring.
Its agent collects Kubernetes-specific data such as namespace, pod name, image name and UID.
If Kubernetes core components, such as kubelet, proxy and API server are deployed via Docker, the Sematext Docker Agent will collect Kubernetes core component logs as well.
Under the Hood
“To users that use Elasticsearch, it’s nice for them to hear that we expose the Elasticsearch API. That means they can use a bunch of different tools to get the data in, like Logstash. So all the people who use them have to do it configure them and point them to us instead of some local server, and it works. They get the data into our system,” Gospodnetić explained.
“The benefit to that is that they don’t have to manage the Elasticsearch part, which is typically the beast. For logging, this is the beast that requires the most expertise, the most infrastructure and the most money. So if you can just point to us, you don’t have to do anything special. Kibana is also integrated into the UI, which lets you see your logs and search them.”
He concedes, however, that Elasticsearch is not the best option for a time-dated data store; it uses HBase instead for metrics.
Cloud infrastructure monitoring is expected to be the fastest-growing segment of the global IT infrastructure monitoring market, according to a forecast from Persistence Market Research, which breaks the field down into network, server, storage, cloud and application monitoring. Its projected monitoring market value of $34.1 billion by 2024 helps explains the proliferation of companies jumping into the fray.
Self-funded Sematext was founded in 2007 by Gospodnetić, a long-time member of the Apache Software Foundation and member of the Lucene (search engine), Solr (search), Nutch (web crawler) and Mahout (machine learning and data mining) development teams.
Sematext customers include EMC, Bloomberg, BBC, Tumblr, Shutterstock, and Salesforce.
TNS analyst Lawrence Hecht contributed to this story.
CoreOS, Docker, and New Relic are sponsors of The New Stack.