Data Science / DevOps Tools / What Is DevOps? / Sponsored / Contributed

How to Select the Right Database for Time-Series Data

2 Dec 2021 7:38am, by

Jane Fine
Jane is a director at MongoDB, where she oversees the Developer Experience team. A developer by trade, Jane joined MongoDB in 2016 from Teradata Aster, where she ran a big data practice delivering advanced analytics and machine learning projects across e-commerce, gaming and financial services industries.

Time-series databases are the fastest growing database category today. With the explosive growth of Internet of Things (IoT) sensors, organizations need ways to store and analyze tremendous amounts of time-series data, which can include nearly any type of measurement taken at time intervals.

Stock tickers, water temperature sensors, blood glucose monitors, smartwatches, smart meters and connected cars are among a fast-expanding array of devices and systems that generate time-series data.

As the number of possible IoT, fintech and e-commerce use cases grows, so does the number of application developers working with time-series data. In 2020, Evans Data Corp. forecast that 64% of all developers would be building IoT apps in the next 12 months.

For all these application developers, finding the right database is essential. Ultimately, you need a solution that helps you maximize the value of time-series data for app users without requiring you to spend excessive time integrating data stores or managing the database.

What’s Generating All of This Time-Series Data?

Small, inexpensive IoT sensors and fast, reliable wireless networks have made it easier than ever for organizations to generate and collect time-series data. Across industries, that data is being used for a wide variety of use cases.

Retail stores: Walmart manages more than 7 million unique data points in its stores. An IoT app producing time-series data notifies technicians if a particular refrigerator is struggling to maintain the proper temperature. Another IoT app enables the retailer to remotely monitor and adjust HVAC and energy usage in stores according to changing demand. Walmart used that app to efficiently modify HVAC usage across numerous stores when schedules shifted during the pandemic.

Vehicles: The average car today might have 100 sensors. But more advanced electric vehicles can have more than 1,000, many generating time-series data. In addition to the sensors that automatically start the wipers when it’s raining or tell drivers when it’s time for an oil change, sensors can enable fleet managers to track their moving trucks and help insurers identify drivers who deserve discounted rates.

Financial markets: The New York Stock Exchange and NASDAQ together can generate between 30 billion and 75 billion market events per day. Analyzing portions of that time-series data — for example, the performance of particular stocks over time — enables investors to predict future changes and make important data-driven financial decisions.

Given the growing use of IoT sensors and applications, it’s not surprising that IoT data volumes are rising rapidly. According to IDC, data from IoT devices will have grown from 18.3 zettabytes in 2019 to 73.1 zettabytes in 2025, nearly a 4x increase. Those large volumes of sensor data — which might be generated in a variety of formats, potentially at high frequency — must be stored in the right kind of database.

Criteria for Selecting the Right Database

Some organizations attempt to optimize their existing database to accommodate time-series data. That can be a complex, expensive and time-consuming undertaking. It might require a fair amount of trial and error to support the scale of time-series data while still delivering adequate performance. Developers might also need to implement analytics functionality in the app, which is essential for spotting trends and anomalies over time, but can have tremendous performance impacts if not built into the database layer.

Archiving aged data presents another challenge: Time-series data grows rapidly, but often has low signal-to-noise ratio, so you want to extract valuable insight quickly, then ship it off into lower-cost cold storage, which might require building a custom data pipeline for archiving.

Other organizations select niche point solutions designed exclusively for time-series data. These time-series databases don’t suffer from the same issues as retrofitted general-purpose databases. In particular, they can be better at handling massive amounts of data generated at very high frequencies by IoT sensors and usually have built-in analytics. But these databases can also introduce new challenges. For example, developers might need to use a new query language and acquire new skills to operate, secure and integrate the database with enterprise apps. At the same time, organizations using these databases might still need to build custom ETL (extract, transform, load) pipelines and integrate an archive to deal with historical data.

For most organizations, the best solution might be a database that combines built-in support for time-series data with complete data life-cycle management. A database that integrates time-series data with enterprise data and analytic capabilities enables you to produce contextualized insights rapidly without having to undertake complex optimization work. Meanwhile, a database that offers data-tiering across the entire data life cycle — from ingest and initial storage through access, analysis, visualization and archiving — can help you simplify management and reduce storage costs.

data-tiering strategy

An effective data-tiering strategy can make it easier to analyze aged data while simplifying data management and reducing storage costs.

Addressing Real-World Use Cases with MongoDB

Digitread Connect needed a database that could deliver the right combination of simplicity, flexibility and performance. The Norwegian industrial IoT company creates turnkey solutions for customers ranging from local governments to fish farmers.

“We need to harness value out of sensor data, as uploading, storing, and managing time-series data is an essential part of our everyday life,” said Christoffer Lange, CEO of Digitread Connect. “With a large amount of sensor data streaming in from assets around the world, we required a database that fulfilled new requirements traditional databases were not optimized for.”

The Digitread Connect team explored the MongoDB native time-series capabilities, introduced with MongoDB 5.0, in conjunction with applications for monitoring subsea robots. The result?

“We have found what we consider a state-of-the-art database platform for IoT data,” said Lange. “Utilizing out-of-the-box functionality for optimized memory consumption, rapid query performance and excellent aggregation pipeline operators … improves our delivery performance for IoT solutions.”

Interested in learning how to model time-series data in MongoDB? Check out our how-to guide or attend our weekly webinar series.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Photo by Ron Lach from Pexels.