A Way to Replicate Data at the Edge
Edge data is exploding, but organizations’ struggles to pool the data together for where and when its developers, data scientists and other DevOps team members need it are among the challenges it poses. As a potential solution, InfluxData’s Edge Data Replication enables developers to collect, store and analyze high-precision time series data in InfluxDB at the edge — while replicating all or subsets of this data into InfluxDB Cloud — the company says.
InfluxData is touting its Edge Data Replication as a major release to complement its bread-and-better offerings as a time-series data platform and monitoring provider. Described by the company as a “critical first step in InfluxData’s journey,” Edge Data Replication was designed to help solve time series data integration and orchestration challenges for distributed Industrial and Internet of Things (IoT) applications in energy, manufacturing, aerospace and other data-intensive industries in the tech sector.
“Edge Data Replication is our first offering built specifically to orient our offerings into a vertically integrated stack — a solution where it’s not edge or datacenter or cloud, but one where you capture, store and analyze your time series data where it makes the most sense for you and your use cases,” Brian Gilmore, InfluxData‘s director of IoT and Emerging Technologies, told The New Stack. Edge Data Replication is our solution for the current InfluxDB 2 generation, but as we bring IOx to the market, first in the cloud, and then at the edge, customers will be able to leverage even more conveniences and capabilities for truly hybrid and distributed time series applications.”
The Hard Way
Without Edge Data Replication, DevOps teams must otherwise invest in queues, brokers, observability-pipeline systems and other tools or build them themselves in order to orchestrate their time series data pipelines, Gilmore said. Instead, Edge Data Replication consolidates both the mechanics and the observability of often notoriously complex edge-cloud data-management and orchestration processes back into the database “where it probably always belonged,” Gilmore said. “What took potentially thousands of lines of code and configurations before can now be handled with a few simple InfluxDB API calls,” Gilmore said.
The ability to pull data through low-latency connections from distributed and disparate sources from the cloud and edge into a single data layer accessible with a single API resembles a data-streaming platform with Kafka in this writer’s mind. However, Edge Data Replication is not that, Gilmore said. This is because Edge Data Replication relies on a disk-backed queue (and not Kafka) to maintain the replication streams. The main difference between a disk-backed queue and other solutions using external streaming pipelines like Kafka is that it is fully integrated with InfluxData’s Flux processing (InfluxDB’s data scripting language) and bucket-storage model,” Gilmore said.
The simplest example would be to use Telegraf (InfluxData’s open source collection agent) to ingest streaming IIoT data into InfluxDB at the edge, run Flux processing on that data in near-real time to process, aggregate, enrich or analyze, and output the results of Flux processing into a replication bucket, Gilmore said. “The processed data will ‘appear’ in InfluxDB Cloud as near-real-time manner as possible,” in that way, Gilmore said.
“So, at a high level, if you’re using Kafka, the model is ‘build persistence into the streaming,’ Gilmore said. “With Edge Data Replication, you are ‘building streaming into the persistence.’”
Why Not Keep It on The Edge
However, the question remains why it is optimal to pool together time series data on the edge into a cloud environment. That concern is actually missing the point, Gilmore said. The issue is that Edge Data Replication is “less about shipping to the cloud than it is about intelligently distributing storage and analytics workloads from edge to cloud in the most logical way possible. Sending sensor data straight to the cloud, analyzing it there, and returning insights to the edge is impractical for the real-time demands of IIOT applications,” Gilmore said. “On the other hand, loading up infrastructure at the edge with the horsepower needed to enable cloud-like machine learning, global visibility and other resource-intensive processing is also impractical due to the current constraints of hardware and network availability and reliability at the edge.”
A key takeaway is that Edge Data Replication enables “a best of both worlds solution” where localized and highly precise data is collected, stored and analyzed at the edge for those real-time use cases and can be strategically mirrored to cloud compute and storage or for global visibility where reduced precision is acceptable and often desirable due to cloud-provider ingress and egress costs,” Gilmore said.
In a demo, Sam Dillard, senior product manager, for InfluxData, IoT/edge, described how Edge Data Replication can serve as a central hub or visibility “into your entire your entire infrastructure, your whole fleet.” A car manufacturer can make use of this central data hub since it might have facilities around the world with operators local to the different sites that are operating equipment and creating data locally at these sites. Edge Data Replication provides all of the data in the cloud where you have your data scientists, business analysts, other engineers payment and they’re getting a holistic picture of the entire fleet,” Dillard said.
Edge-data replication will also become more important in the future since edge environments are expected to continue to see major growth in the near and long term. According to Gartner’s “Market Guide for Hybrid Cloud Storage” by Gartner analysts Julia Palmer and Raj Bala. The edge data will also increasingly need to be accessible in a centralized way, usually through a single API, in ways that DevOps teams can access on an as-needed basis, which Edge Data Replication was designed to do.
“Enterprises are now employing machine learning models and computing at the edge in order to preprocess data rather than just migrating data to the cloud. Rather than one-way archiving, early adopters are doing multidirectional synchronization of data between edge, core data center and public cloud,” Palmer and Bala wrote. “Two-way synchronization allows enterprises to make use of the elastic nature of the compute infrastructure found among cloud service providers. Furthermore, with the upcoming expansion of edge computing, there will be a growing need for solutions that enable data workflow among edge storage and multiple public cloud providers.”