Favorite Social Media Timesink
When you take a break from work, where are you going?
Video clips on TikTok/YouTube
X, Bluesky, Mastodon et al...
Web surfing
I do not get distracted by petty amusements
Data / Software Development

Predictive Analytics Using a Time Series Database

Processing time-stamped data at high speed and volume, a time series database is particularly suited for anomaly detection and predictive maintenance.
Jul 28th, 2023 1:38pm by
Featued image for: Predictive Analytics Using a Time Series Database
Image via Pixabay.

Predictive analytics harnesses the power of big data, statistical algorithms and machine learning techniques to anticipate future outcomes based on historical data. Various industries use predictive analytics, from finance and healthcare to retail and marketing. Among its many uses, predictive maintenance and anomaly detection are two significant applications. Predictive maintenance uses predictive analytics to forecast machinery breakdowns, enabling timely maintenance and prevention of unexpected downtime.

Similarly, anomaly detection leverages the same predictive power to identify irregularities in data that could indicate a problem, such as fraud in financial transactions or intrusions in network security. Together, these applications of predictive analytics help organizations stay proactive and informed, paving the way for enhanced efficiency, reduced risks and improved decision-making.

A time series database provides key functionality for performing predictive analytics. Built specifically to handle data points indexed in time order, it allows for storing, retrieving and processing time-stamped data at high speed and volume. These capabilities make time series databases particularly suited for tasks such as anomaly detection and predictive maintenance.

InfluxDB 3.0 is a time series database and platform for storing all types of time series data including metrics, events, logs and traces.

In this post, we’ll explore how to combine InfluxDB Cloud, Quix and Hugging Face for predictive maintenance, including predictive analytics and forecasting. Quix is a platform that allows you to deploy streaming pipelines for analytics and machine learning. Hugging Face is a ML platform that enables users to train, build, host and deploy open source machine learning models and datasets.

The Dataset

The dataset we’ll use for this post comes from this repo and specifically this script. It contains generated machine data with values like temperature, load and vibration for a variety of machine IDs. It’s a fabricated dataset so we can induce anomalies when needed to test the anomaly detection. This is what the data looks like coming out of the influxdb-query service.

The Quix Anomaly Detection and Prediction Pipeline

Quix enables us to deploy streaming pipelines for analytics and machine learning. The image below depicts our pipeline:

The Workspace Pipeline contains the following services:

  1. A source: This service project is responsible for querying InfluxDB Cloud with InfluxQL (query language) and converting the output to a Pandas DataFrame so that the Transformation block can consume it.
  2. Two Transformations: These service projects are responsible for finding the anomalies within the data and generating predictions. They operate on the data from the source service in parallel.
    • Event detection finds anomalies.
    • Forecasting transformation generates forecasts.
  3. Two Writes: These service projects are responsible for writing data back to InfluxDB Cloud with the InfluxDB 3.0 Python Client Library. There are two write instances because we’re writing to two separate InfluxDB Cloud instances. However, you could choose to write data back in parallel if you wanted to write all your data to the same instance.

What makes Quix so easy is that you can add a new service (services run on a user-defined schedule) or a new job (which runs once) by selecting from a variety of common services. They contain all of the boilerplate required to stream the data and pass the inputs into the correct outputs. Additionally you can easily stream Pandas DataFrames between projects, which removes any data conversion effort.

Once you select a new transformation service, you can select from a variety of example scripts to add to your pipeline.

Source Project

This block runs every 60 seconds. It queries for the past 5 minutes of machine data with the following code with the InfluxDB 3.0 Python Client Library:

You configure the connection to InfluxDB Cloud

Event Detection Project

In this tutorial, we used Keras Autoencoders to create and train an anomaly detection model. Autoencoders are a type of artificial neural network used for learning efficient codings of input data. In anomaly detection, the Autoencoders train on normal data and learn to reproduce it as closely as possible. When presented with new data, the Autoencoders attempt to reconstruct it using the patterns learned from the normal data. If the reconstruction error (difference between the original input and the Autoencoder’s output) is significantly high, the model classifies the new data point as an anomaly, because it significantly deviates from the normal data.

We used this Jupyter notebook to train the data model before pushing it to Hugging Face. Next, we imported the model into the Quix Event Detection project. The model is a variable, so you can easily swap models in Hugging Face. This allows you to separate the model tuning and training workflow from your pipeline deployment.

An example of editing the model variable to easily pull different trained models from Hugging Face by specifying the model variable. The model jayclifford345/vibration-autoencoder is selected.

Forecasting Transformation Project

We built this project from the example Starter Transformation project from Code Samples. It uses Holt Winters from statsmodels to create a fast forecast.

Write Projects

The Write Projects (influxdb-write and influxdb-write-2) write the anomalies and forecasts to two separate InfluxDB instances. This design choice was arbitrary. It merely showcases this architecture as an option. The InfluxDB 3.0 Python Client LIbrary writes DataFrames to both instances.

Final Thoughts

After writing our anomalies and forecasts into InfluxDB, we can use a tool like Grafana to visualize the data and create alerts to take action on it. For example, if we get too many anomalies we might decide we need to diagnose a problem in the system, replace a sensor or machine, assist in predictive maintenance or redesign a manufacturing process. Alerting on vibration forecasts could prevent manufacturing interruptions or optimize operations.

Rely on InfluxDB Cloud to store all of your time series data. To learn more, take a look at this demo. It provides an example of how to generate logs and traces from a sample application, store them in InfluxDB Cloud and use Jaeger and Grafana to visualize them. It leverages Apache Arrow, Apache DataFusion and Apache Parquet to provide users with unlimited cardinality data, best-in-class query engine and interoperability with other visualization, business intelligence and machine learning (ML) tools. Take a look at the following resources to learn more about how to query InfluxDB from Tableau and forecast with Tableau. Additional documentation contains more information on InfluxDB’s interoperability with other tools including Grafana, Pandas, Superset and PyArrow. Get started with InfluxDB Cloud 3.0 here.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.