InfluxData sponsored this post.
The instrumentation of everything around us is increasing the need for better monitoring and analytics. In fact, it is so important that absent these functionalities most of the benefits of instrumentation cannot be realized.
In today’s fast-changing tech landscape, the right monitoring and analysis tools are required to detect the occurrence of every interesting business moment. Currently, sensors are being placed on practically every available surface in the material world — from machines to humans. Almost anything that is capable of giving off a recorded event or measurable metric can be instrumented in the virtual and physical worlds. Metrics involve the measurement of characteristics, such as temperature, pressure, etc., while events can be anything that happens, such as a consumer placing an online order.
There are valid business needs to justify constant monitoring of metrics streams and events data. Companies want to become more data-driven and apply data insights to be better situationally aware of business opportunities and threats. A data-driven organization can predict outcomes more efficiently and effectively than simply relying on historical information or gut instinct. When vast amounts of data points are monitored and analyzed, the organization can find interesting business moments in the data. These insights help identify emerging opportunities and competitive advantages.
Monitoring solutions include capabilities to monitor web and cloud applications, infrastructure, networks, platform, and microservices. The solutions vary greatly in terms of the type of infrastructure they can monitor and the granularity of visibility they can provide. Some solutions claim to do it all, from managing application performance to network speed and even cloud usage, while others are more specialized and track one or more metrics of the infrastructure stack.
5 Monitoring Challenges
Either way, when it comes to monitoring data for analysis using traditional monitoring tools, enterprises typically experience a number of challenges, which include:
- Cost outpacing return — Cost is a significant hidden disadvantage to most of these monitoring solutions. They may appear inexpensive, often charging per host or amount stored. For example, a typical data center deployment (primary/backup) can have up to 300 hosts, with 30 physical servers at the primary data center and a dozen at the backup, which could easily cost thousands of dollars per month. And if considering a solution that charges by storage, it would run the risk of not collecting enough metrics to save on storage costs which could be detrimental to fulfilling SLAs.
- Lack of a cross-domain solution — Every business is different, so one monitoring solution does not fit all. While a single team within an organization can adopt a specialized solution quickly, this will lead to monitoring fragmentation, making it difficult to troubleshoot across domains. It is not uncommon for organizations to have 10+ monitoring solutions, which may work for individual groups, but will fail when the entire system is not performing as required.
- Lack of continuous intelligence — Businesses are challenged to stay ahead of competitors by ensuring their services can meet the high availability requirements of customers. Industry analysts call the term “AIOps” which suggests businesses should combine event stream processing, real-time data analytics, and artificial intelligence (AI) to get the real-time situational awareness required to keep services available and reduce any downtime. This is where today’s monitoring solutions fail since they lack the extensibility capabilities to integrate effectively with AI/ML solutions to deliver meaningful insights.
- Data held hostage — Today’s monitoring solutions are able to ingest data for monitoring and dashboarding, however, they often have restrictive retention policies or are unable to easily pull out specific data. These elements are important to organizations that need this data for trend analysis and forecasting. Restricted access limits what data is available, how many times a user can access it, and sometimes requires paying to get access to an organization’s own data. Even then, there is no guarantee that the monitoring solution will be able to provide all the data that was ingested and stored.
- Inflexibility — Today’s monitoring solutions are not flexible, often requiring users to follow a prescribed path which can be time-consuming. Some solutions, for example, require users to tag every metric, which is restrictive and time-consuming. Additionally, many do not support use cases that require monitoring information, such as dashboards, to be embedded inside another application. Instead, they provide limited capabilities to embed their dashboards into other applications. Lastly, some solutions offer pre-defined connectivity to specific apps, making it challenging to monitor solutions that are not connected.
Using the TICK Stack for Constant Monitoring
Purpose-built monitoring platforms have been created to deal with demanding new requirements for monitoring and analyzing these business-specific metrics and event workloads — called time-series data — and provide situational awareness to the business. Time-series platforms can ingest millions of data points per second, scale both horizontally and vertically, and weave-in strong machine learning and anomaly detection functions to deliver “AIOps”. In addition, they are resource-aware, applying compression and down-sampling functions to aid in optimal resource utilization, and are built to support faster time to market with minimal dependencies.
InfluxData developed open source stack to specifically address the challenges of monitoring and analyzing time series data. InfluxData’s TICK stack consists of:
- Telgraf, with its over 200 plugins, collects metrics and events (both time-stamped data).
- InfluxDB is the time-series database that can ingest large volumes of data and handle queries and data compression, etc.
- Chronograf provides the user interface for the stack as well as the dashboards for visualizing the queried data.
- Kapacitor is the real-time stream processing engine that detects anomalies and triggers alerts and notifications.
First of all, the ability of Telegraf plugins to collect metrics and events from over 200+ systems, databases and applications make it extremely easy to collect metrics and events in a uniform way. Its small footprint makes it easier to deploy the agent on an edge where CPU or memory may be a constraint. Telegraf agent can monitor Kubernetes nodes, legacy databases, networking devices, logs etc. with equal ease and that makes it easier to replace legacy monitoring silos as this single solution can monitor the whole IT landscape.
Once the collected metrics and events data make their way into InfluxDB, they can be stored, queried and analyzed. Even better, the stored data can be visualized using dashboards.
Additionally, due to the open source core of the stack, it adds new capabilities faster compared to the proprietary systems as community members continuously add support for new data sources for metrics and events.
Today’s application architecture spans multiple clouds and may include cloud native, Kubernetes, Docker, and other virtualization topologies. To make matters more complex, there could be IoT devices which are deployed on edge(s) and streaming data into some of these applications.
As these applications are built over time, it is natural to end up with different application and data monitoring technologies. This results in the five disadvantages outlined above.
A purpose-built time series platform such as TICK stack overcomes these shortcomings by combining all metrics and events data generated by servers, network devices, databases, containers, clusters and IoT devices, and by providing a single pane of glass view of the landscape that shortens the mean time to resolution (MTTR) when issues arise.
Following is an example of Kubernetes metrics being displayed using Chronograf dashboards, similarly, data from any infrastructure can be displayed in this dashboard as well.
With new technologies, organizations can address today’s most challenging data monitoring issues and have tools in place to best take advantage of what their data is telling them, to support real-time decision making and reduce service outages. For organizations that are focused on instrumentation of any type, whether it’s a DevOps toolchain instrumentation or IoT instrumentation, it is critical to have a long-term strategy for monitoring and visibility to deliver both continuous and situational intelligence.