Collect, Store and Analyze IoT Data Using AWS and InfluxDB
Consumers want smarter and more efficient services, and businesses want more data to make better decisions. The result is the Internet of Things (IoT) market rapidly expanding as billions of devices go online. One of the biggest challenges businesses face is tying together their hardware and software so they can generate business value from these devices.
The first challenge is collecting data from devices in the field and getting it to the cloud. The second challenge is being able to efficiently store and analyze the millions of time-series data points being generated by those devices. Tools provided by companies like AWS and InfluxDB can help simplify these problems. In this article, you will learn about a variety of tools available for working with IoT data from collection to analysis.
Collecting Data at the Edge
Everybody wants to modernize their business and make smarter decisions using data, but doing this is easier said than done. One of the primary challenges is doing the groundwork of reliably collecting data from IoT devices. Some common challenges at this stage are:
- Intermittent internet connectivity from devices in the field.
- Security of transmitted data with encryption and authentication.
- Filtering and transformation of data at the edge to reduce bandwidth consumption.
These problems are simply implementation details that don’t provide value to businesses. Luckily, AWS provides services that abstract away many of these difficulties, which reduces development time and allows developers to focus on what matters for their applications.
The first service is AWS IoT Greengrass, which is deployed directly on your IoT devices. Greengrass acts as a middleman between your devices and the cloud and handles things like:
- Automatically retries sending data to the cloud if connection is interrupted.
- Filtering or transforming data on the edge before it is sent to the cloud.
- Local network connection management between devices.
Greengrass also provides connector components to send data back to the cloud using a variety of different protocols. The biggest benefit of using Greengrass on your devices is the number of architecture options it gives you in terms of processing and moving your data. I’ll cover three examples for how you can get your data from your device to the cloud using AWS Greengrass.
Edge and Hub
Some applications require processing and analysis in real time and can’t risk latency or potential downtime involved with exporting work to the cloud. In this case, one architecture pattern is known as edge and hub. With this setup, InfluxDB is deployed on the same network as the IoT devices so data can be stored and analyzed locally. Data is often downsampled and eventually sent to cloud storage, where it is accessible for long-term analysis.
An example of this use case is Equinor, Norway’s state-owned oil company. It wants to be able to detect potential problems with their ocean oil rigs in real time, like finding a potential leak and shutting down the rig as soon as possible. Running this type of analysis in the cloud would be too risky due to potential latency or lost connection, so it deploys InfluxDB locally. But it also wants to be able to analyze long-term data, so it downsamples and exports to the cloud as well.
AWS Greengrass provides the ability to send data to InfluxDB for this use case with two different components:
- AWS IoT Greengrass InfluxDB — This component runs and manages an InfluxDB instance on the same device.
- AWS IoT Greengrass InfluxDB Publisher — This component exports telemetry data to an InfluxDB instance running on different hardware, either locally or in the cloud.
AWS IoT Core
AWS IoT Core is essentially a managed MQTT broker that connects nicely with Greengrass. Once data has been ingested by IoT Core, you can pass that data around to any number of tightly integrated AWS services depending on your use case. You can learn more about some of these services by watching this AWS presentation from InfluxDays EMEA 2021.
Another option for getting data out of your devices running Greengrass is to use Telegraf by consuming messages from the Greengrass MQTT broker. Once Telegraf is collecting the data, you can take advantage of over 300 different plugins for processing, transforming and then exporting data to the cloud.
Working with Data in the Cloud
Once your data is collected and stored in the cloud, the real fun begins, which is using that data for analysis, visualization and optimization. Let’s go over some typical use cases for IoT data.
Visualization is probably the most common and often the first step for working with IoT data. InfluxDB provides dashboarding tools out of the box but also makes it easy to pull your data into third-party visualization tools that you are comfortable with, like Grafana. This type of solution is generally used by internal teams.
In some cases, you may want to expose this data to consumers. InfluxDB makes this possible through client libraries and REST API. An example of this would be a smart thermostat company allowing users to look at their personal data via mobile app.
The next step beyond visualizing data is to actively monitor and take action based on your data. The simplest action you could take here would be setting a threshold for certain data points and then alerting someone to take action if those threshold values are violated. An example of this could be a temperature value from a sensor increasing to an unsafe level, which triggers an email, SMS or Slack notification for a user or on-call employee.
InfluxDB provides this functionality out of the box with tasks. You can set a time interval for the task to run and check the data and then take action based on the result. Eventually, you could completely automate the process and take programmatic action to resolve issues when task thresholds are triggered.
As you get more sophisticated with your setup, you can begin to consider using machine learning to optimize your business. When working with time-series data, there are essentially two things you can do via machine learning:
- Attempt to predict the future via forecasting.
- Classify the present status of your system through anomaly detection.
AWS provides a number of services for training, deploying and running your own models as well as services that allow you to access pretrained models for more general use cases. InfluxDB’s API and client libraries make data accessible to your preferred language ecosystem, with Python being the most popular for data science and machine learning use cases. Models can be deployed and then used to make predictions, which can then be written back into InfluxDB as forecast projections. These predictions can also be used to replace manual intervention with automated actions.
This article was really just a high-level analysis of the tools available via AWS and InfluxDB for working with IoT data. For more information about the IoT ecosystem and more specific use case examples, check out some of the following resources:
- Getting started with data science for time-series data
- How to use AWS IoT Core and AWS Lambda with InfluxDB
- Time-series data analysis methods
- Time-series data forecasting methods