Case Study: Building a Hybrid Edge Cloud IIoT Platform
Having systems in place that can handle data from multiple sources and provide users with the tools they need to get work done is central to industrial IoT (IIoT) use cases.
Data generated at the edge is often useful both at the edge and in the cloud. That’s why Herrenknecht AG built a hybrid edge-cloud platform for connected devices in the field.
The following case study reveals the challenges the company faced and the solutions they iterated to create a stable, reliable IIoT platform and ecosystem.
Herrenknecht is a technology leader around mechanized tunneling systems. The company has a global footprint, delivering cutting-edge tunnel boring machines (TBMs) for all ground conditions and in all diameters. TBMs are massive industrial machines, and their cutting surfaces range from 0.10 to 19 meters.
Herrenknecht tailor-builds machines for applications such as transport tunnels (traffic tunneling) and supply and disposal tunnels (utility tunneling). It also provides innovative solutions for efficiently installing pipelines underground.
These machines have thousands of sensors generating high-velocity data that feeds cutting-surface metrics like advance speed, advance progress, cutting wheel torque, maximum allowed penetration per minute and maximum allowed thrust force.
He earned a Ph.D. in modern Irish history from Loyola University Chicago. Since then, he has used his writing skills to create content for a range of startup and technology companies.
Having current and accurate information is critical for a TBM operator so they know what to do, how to direct the machine and how to react to sudden events.
To better make use of all the data these machines generate, Herrenknecht engineers set out to build an IIoT platform that provided insight into live and historic data for all their TBMs.
They sought open source solutions in an effort to reduce costs.
They also needed something that was easy for a small team to maintain so they could spend more time developing the platform, not managing infrastructure.
The Technical Challenges
The Herrenknecht team had a number of challenges to address when building its IIoT platform.
They needed to account for:
- Data scale
- Connectivity for remote devices
- Longitudinal data management
- Architecture compatibility
Herrenknecht’s platform needed to support all its machines in the field, which totals more than 2,000. At any time, several hundred can be working simultaneously at various places around the globe. A single TBM can have 5,000 sensors. Some use cases, like traffic-tunneling machines, can have even more. To further complicate the equation, each sensor can have a different sample rate. Some sensors have a rate as low as 100 milliseconds.
Operators deploy TBMs at job sites all over the world, often in remote locations, which sometimes results in connectivity and bandwidth issues. The very nature of tunneling also means that these machines operate 10 to 15 kilometers deep into the earth and can be completely offline and disconnected from the internet for TBMs for days, weeks or months at a time.
Longitudinal Data Management
Accessing real-time data is critical for TBM operators, but the Herrenknecht team also wanted their platform to handle historical data collected from TBMs over the past four decades. This data existed in many different formats, so the platform needed the ability to ingest data from SQL databases, DBX and CSV files, and other varied formats and sources.
The technology stack on older TBMs runs on Windows. Newer TBMs run primarily on Linux but retain the ability to run on Windows if necessary, so the platform needed to be compatible with both.
Herrenknecht developers chose InfluxDB as the central time-series storage database for their IIoT platform. Several factors played into this decision. For example, InfluxDB runs on multiple architectures, which answered the Windows/Linux question right out of the box. The company also designed its platform with growth in mind, and InfluxDB offered both an open source version and a commercial enterprise edition. This helped to future-proof the platform so that the team wouldn’t have to redo everything in a couple years’ time.
After choosing InfluxDB, how did Herrenknecht developers actually use it?
First, let’s look at how they handle the sensor data they collect. The team was very conscious of series cardinality and sought a way to keep it low. They settled on a unique approach to data storage to keep cardinality down and reduce memory usage.
Instead of collecting measurements for individual sensors or for specific sensor groups, they collect one measurement that has thousands of fields, where every field is a float type. And they only write values if a significant change occurs in the sensor reading from the previous one.
They round those data values to determine if a significant change occurs and only write values that qualify as a significant change. Sometimes the team needs to consider the accuracy of the sensor in question and re-round the value if the detected change is greater than the accuracy threshold. If, after the rounding process, the value doesn’t change, they can simply drop that value.
They use tags sparingly, primarily to indicate the operational state of a TBM, and these tag value types are simple Booleans. Using this approach, Herrenknecht only ends up writing, on average, between 1 to 5GB of data per machine, per month.
Let’s turn to the architecture of the platform. It consists of a combination of edge and cloud solutions. The tech stack on the edge is capable of functioning independently of, and in concert with, the cloud. This design gives Herrenknecht customers flexibility with the way they use the platform and control their data.
Initial data collection and storage occurs on the TBMs, at the edge, as each machine runs its own instance of InfluxDB. For data ingest, Herrenknecht uses a C#-based client library and stores it in an open source instance of InfluxDB running directly on the TBMs. Because everything starts at the edge, the platform takes advantage of InfluxDB’s data processing and storage capabilities.
Currently, they use InfluxQL for data processing, although they plan to start using Flux soon, and custom software to derive aggregations from their granular data. Herrenknecht’s other services run on top of those aggregations. For customers that do not want to copy their data to the cloud, the edge instances of InfluxDB power dashboards locally.
InfluxDB OSS in the Cloud
Herrenknecht’s cloud solution evolved over the years. Initially, they built a cloud solution using InfluxDB OSS. For this, the Herrenknecht team used Microsoft Azure Cloud for storage and ran one InfluxDB container per TBM inside Kubernetes. The on-premises instance of InfluxDB at the edge would write to a unique instance of InfluxDB OSS in the cloud. The structure of the cloud solution was almost identical to that running on the TBMs, so deploying it required little additional development.
With this setup, the edge-to-cloud data transfer was minimal. All the data processing and cleaning takes place at the edge, so only clean data is sent to the cloud. The visualizations in the cloud and on the machine need to be identical, so Herrenknecht developers built a custom synchronization based on a custom REST API so the data in the cloud mirrors exactly the data on the TBM.
The cloud system worked well until Herrenknecht reached approximately 100 instances of InfluxDB. At that point, it became unreliable because InfluxDB response times increased dramatically. A typical response time of 100 or 200 milliseconds for a simple query would suddenly increase to one or two minutes. This was a purely random phenomenon and there was no way to predict which queries would slow or when it would happen.
Investigating the issue further revealed that the problem was not with InfluxDB but rather an issue with Azure at the storage layer. They encountered a bug where the file storage sometimes forgot to unlock files and when InfluxDB wanted to access the file, an error occurred and the access failed. The team would then have to manually unlock the files.
Even though this was a rare bug, because they were dealing with hundreds of containers, the errors happened frequently. These storage reliability issues, coupled with the rising costs of maintaining containers and Kubernetes nodes, drove Herrenknecht to explore more reliable solutions.
Migrating to Enterprise
To improve the system reliability, Herrenknecht migrated from running open source in the cloud to InfluxDB Enterprise. Because they already had a large store of historical data, the team needed to migrate that data to the Enterprise version with minimal downtime, at most a few minutes per machine.
In January 2020, they completed the migration process over the course of about two weeks. They wrote some simple shell scripts, which robustly migrated the data from the InfluxDB containers in Kubernetes into a new InfluxDB Enterprise cluster. During the migration, no “read” downtime occurred, and “write” downtime totaled less than 60 minutes.
“We have the open source version of InfluxDB on every tunnel boring machine that we deliver, and we have InfluxDB Enterprise in the cloud, and it just works in the background. We don’t really have to take a lot of care with it. It’s just there, and it’s reliable, so we don’t have a lot of DevOps efforts … and our small team can concentrate on feature development.”
— Tobias Braun, software architect, Herrenknecht AG
Instead of having multiple Kubernetes nodes for all the InfluxDB OSS containers, they switched to a smaller, more cost-effective cluster of InfluxDB Enterprise. Following the migration process, the total cost of ownership of Herrenknecht’s InfluxDB system fell by one-third, driven by the reduction of virtual machines.
This transition also delivered the reliability and stability that Herrenknecht needed to power its IIoT platform. The switch eliminated the issues the company encountered with slow queries and increased response times.
InfluxDB allowed Herrenknecht to future-proof their IIoT platform as much as possible. It enabled Herrenknecht’s platform room to grow and supplied a robust and flexible architecture that allowed them to exchange components at any time without a lot of effort or outages.