How Veritas Relies on Time-Series Data to Fix Glitches Before They Happen
InfluxData is a sponsor of The New Stack.
Managing the cost of data storage is among the challenges that represents Veritas Technologies’ key mission as a leader in data backup center and recovery. Veritas Technologies offers a complete family of data-protection and long-term retention appliances to help to ensure data availability and reduce costs and complexity while increasing operational efficiency. Among these are NetBackup appliances. NetBackUp is a leading backup and recovery software for enterprises, commonly used for backing up data centers.
Proactive Support Through Storage Forecasting
Veritas has more than 10,000 NetBackup appliances deployed in the field actively, as well as years of Veritas Auto Support information and hundreds of millions of telemetry data points from its deployed appliances. However, previously, it didn’t have analytics capabilities for forecasting in order to prevent problems from happening. In other words, visibility was a rearview mirror. If the appliance were to run out of storage at that time, then the backup would fail. A backup fail means that at that point, if any type of event happens in the infrastructure of a company, then there is a risk of data loss.
In examining how it could leverage the vast amount of time-stamped data that it had collected but not yet put to good use, Veritas found that forecasting time series data has enabled them to tackle these four use cases:
- Resource planning.
- Workload anomalies.
- Possible SLA violations.
- Sales opportunities.
This article focuses on the resource planning use case, which is ultimately a means to maintain and improve customer satisfaction.
What Is Forecasting?
First, let’s define forecasting. Forecasting is the process of making predictions of the future based on present and past data. The key assumption behind forecasting is that the way in which the environment is changing will continue into the future. Since forecasts are error-prone, what makes them useful is when the error involved in a given forecast is small for the type of use case being addressed. The smaller the error, the more accurate the forecast, as is discussed below in the “Evaluating the Model’s Accuracy in production” section of this article.
Forecasting for Resource Planning and Downtime Reduction
Veritas decided to use time series forecasting to proactively reduce downtime with its NetBackup appliances in order to lower risk and save cost for its customers. For that purpose, Veritas built Veritas Predictive Insights: a SaaS platform that uses artificial intelligence (AI) and machine learning (ML) to deliver predictive support services for Veritas appliance customers by detecting potential issues and offering prescriptive remediation before problems occur.
Veritas Predictive Insights is built on years of Veritas AutoSupport information and hundreds of millions of telemetry data points from over 10,000 Veritas appliances.
Storage forecasting runs in Veritas Predictive Insights to track storage consumption of NetBackup appliances and reduce downtime. Predictive analytics generated by Veritas Predictive Insights, which provide forecasts on probable events using past data, enable visibility and preventive action.
Solving the Challenge of Storage Forecasting Automation
Once Veritas built the hardware setup in its ML platform, it needed to automate storage forecasting. However, it was impossible to run the forecast manually for the massive volume of data involved. Veritas had more than 10,000 appliances to manage, and for each appliance, it had to forecast the needs of every storage partition.
The challenge was to automate a historically manual process handcrafted for the analysis of a single data series that ranges from just dozens of data points to the large-scale processing of thousands of time series and millions of data points. The first step to meet these wide-ranging needs was to select a time-series database.
Time Series Forecasting at Scale Using a Time Series Database
Veritas chose the InfluxDB time-series database to implement its solution for tackling the issues of time series forecasting at scale, including continuous accuracy evaluation and algorithm hyperparameters optimization. It uses InfluxDB for its storage forecasting implementation of data in Veritas Predictive Insights, which is capable of training, evaluating and forecasting over 70,000 time-series daily. Veritas chose InfluxDB because it is purpose-built for time series data. This made it easier to work with time-series data than other types of databases.
Autonomous AI and ML-Based Data and Infrastructure Management
For each appliance, the telemetry data generates a system reliability score (SRS), which is a simple health score using an additive machine learning (ML) model. The model aggregates inputs from different ML processes to predict appliance health and displays the results in an easy-to-understand format.
The higher the SRS, the better the appliance is operating and the lower the chances of unplanned downtime.
Solving the Three Challenges of Forecasting Automation
To automate its storage forecast capabilities for its appliances and for its SaaS offering, Veritas had to overcome three challenges of forecasting automation:
1. Determining Which Model Is the Best
Selecting the best model for the type of data to be processed can be done manually because it assumes that the data is coming from a similar source. But numerous issues remain, such as when managing missing values, outliers, trends and seasonality, trend change points and algorithm parameters. Veritas met these challenges through algorithm adjustment, advanced detection methods and forecasting tools.
2. Evaluating the Model’s Accuracy in Production
To validate that its model actually performs, Veritas needed to have an accuracy number (a metric that represents error data points) to track the differences between the forecast and historical data. Veritas relied on InfluxDB to compute the accuracy metrics and save this information as time-series data points.
It accomplished this by using two methods: Expanding Window Validation, by processing continuous partition of data in order to train the model, and Sliding Window Backtesting, which involves comparing forecast and historical data comparison, as described above. The end result was the ability to forecast errors as a time series per appliance, horizon and storage type.
3. Continuously Tuning the Model
Veritas had to solve several model update problems with thousands of models to take into account, each of which had to be tuned for a specific time series:
- Running cross-validation — with over 70,000 time-series data sets, the process could be prohibitively expensive and thus not run as often as needed.
- Changes in the underlying process to help ensure the model remained accurate also needed to be maintained.
- Backtesting remained computationally too expensive (since there are thousands of series, and the system would have to run the validation procedure for each series. To solve these forecast model tuning challenges, Veritas began using InfluxDB to compute the error data online and save it as a time series. As a new forecast was generated, the compute accuracy data was based on past forecasts and historical data. Veritas’ system then began forecasting error data as a time series data point per appliance, horizon and storage type. Today, Veritas also relies on a mathematical tool called sequential model-based Optimization (SMBO) in order to solve model update problems. SMBO iterates between fitting models and using them to make choices about which configurations to explore. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then chooses new hyperparameters to test based on these models.
Predictive Analytics to Provide Proactive Support
By using encrypted data from thousands of Veritas appliances, Veritas Predictive Insights’ cloud-based AI/ML Engine today detects potential issues and monitors system health to create proactive and prescriptive remediation. Veritas Predictive Insights enhances Veritas product and customer satisfaction and helps customers:
- Increase operational availability.
- Resolve potential issues before it occurs.
- Reduce TCO by optimizing storage investments and avoiding over-provisioning.
The Technology to Know, the Knowledge to Act
Continuous AI/ML self-learning processes — an integral part of Veritas’ platform — constantly improve insights and accuracy. It also identifies patterns, predicts trends and optimizes resiliency and utilization with intelligent forecasting and predictive maintenance.
Powered by InfluxDB as its time-series database, Veritas Predictive Insights delivers immediate value for both new and existing installations with prescriptive support services that can mitigate problems before they occur.
For Veritas and its NetBackup appliance customers, visibility into the future through predictive analytics offers organizations the ability to react far ahead of time before major errors and outages occur — which can make all the difference in organizational decision-making, service and security outcomes.
You can learn more about this forecasting automation use case here.
Feature image via Pixabay.