How GE Digital Tackled the Stateful / Stateless Problem for Industrial IoT
For stateless architectures to work efficiently, as the IETF has recently declared in its study of RESTful architecture for the Internet of Things, the messages shared between components must be thorough, complete, and self-contained. Anything a receiving function needs to know about the work it needs to do, must be included within the API call that contacts it.
As engineers (including those at the IETF) have come to realize, persistent data will always be a necessary resource for both server-side applications and client-side apps. And the need for the Internet of Things (IoT) applications to maintain a reliable, persistent network of connected sensors and apparati, makes a stable and self-contained network state data platform vitally necessary.
So if the first edition of your industrial IoT platform was built on Cloud Foundry, which prizes itself for its stateless architecture, how do you reconcile that original vision with, if you will, the real state of things?
The Machine Data Machine
“We are in the business of dealing with information and we are building the business of dealing with data, whether we like it or not,” said Balajee Nagarajan, the director of software engineering for the Predix industrial IoT platform service at GE Digital, speaking at the recent MesosCon conference in Los Angeles.
Nagarajan leads the team developing Predix as a PaaS platform for use in industrial IoT — specifically in managing and utilizing the huge capacities of data generated by machines and connected devices. By deploying custom applications on the Predix platform, industrial customers can manage, monitor, and perform preventative maintenance on their own assets, in so doing, reducing unplanned downtime.
“Given the fact that GE has such a huge footprint in many business verticals,” Nagarajan continued, “it was such a natural, organic growth for us to become the leader in building out this industrial IoT platform, especially because of our inherent domain experience.”
In August 2015, General Electric announced its intention to enter the cloud services market. Its original vision involved a product called Predix Cloud, a data gathering and processing platform for its own industrial operations, as well as industrial customers, especially in the fields of healthcare, petroleum, and power generation. Unlike typical public cloud services, Predix Cloud would ingest, interpret, analyze, and report on machine data in real-time.
Last November, GE expanded this product into the “Predix System” — a network of industrially-targeted, distributed services, complete with applications geared more directly towards its key customer verticals. But even though Predix had thoroughly penetrated the IoT service space, its parent company refrained from using the phrase “Internet of Things” in Predix promotions.
Now, whatever doubt there may have been among managers in the parent company has vanished, as the engineers plot a way forward.
“We are trying to become a cloud-agnostic IoT platform,” declared Nagarajan. He described Predix’ current architecture as relying on HashiCorp Terraform to configure and provision infrastructure on Amazon AWS and Microsoft Azure, with plans for other public cloud providers forthcoming. From there, Bosh is used to provisioning Cloud Foundry, which serves as the sole developer-facing component. Customers deploy applications through Cloud Foundry using its familiar model.
“Cloud Foundry has this robust marketplace,” he said, “where you can have asset-building models, analytics-building models. Then Cloud Foundry also provides a very secure interface to Predix itself.”
The way the Predix platform works now, some services are deployed at “the edge” — either on customer premises, or in locations where customers’ sensors are installed. These services act as local aggregation points, pre-processing sensor data to the extent that it can, prior to its being ingested by the central cloud. In locations where GE’s own light sensors are installed, Predix may use GE’s data connection tools to perform analytics functions closer to the sensors themselves.
Elsewhere, however, customers deploy their own custom-developed tools to Predix’ Cloud Foundry base platform, for conducting their own predictive analytics on incoming data. “Cloud Foundry abstracts the developer experience in a way that’s actually sane for them,” explained Venkatesh Sivasubramanian, Predix’ data platform lead. “They don’t have to manage individual machines, infrastructure, and whatnot. It’s actually great for stateless applications. It gives you a nice marketplace you can interact with to create the service instances that you need and bind them to their applications.”
Schrödinger’s Storage Interface
As the colossal quantity of data being ingested became an issue, the Predix team began investigating new options for extending its data storage capabilities. At first, they investigated the Container Storage Interface (CSI) initiative under construction at Cloud Foundry. But as Sivasubramanian told The New Stack’s Alex Williams for an upcoming podcast, CSI had yet to demonstrate signs of developmental maturity — of settling down into a prescribed form.
“We do need to have a single infrastructure that allows us to be able to run both stateless and stateful,” he remarked.
Here is where persistent data layer provider Portworx entered the picture. With Predix using Mesos as its scheduler, whatever data layer the team may choose needed to accommodate stateful and stateless applications, just as Mesos does. Predix had been instantiating its Cloud Foundry-facing services using Bosh, along with Terraform to provision infrastructure and Chef to stage the process. But since Cloud Foundry doesn’t natively support persistent data clusters, they had to be instantiated outside of Cloud Foundry and then attached using its Service Broker API.
Portworx enabled a new provisioning model that leveraged Mesos’ two-level scheduler, formally introduced in August 2016. This way, Predix could introduce a dynamic volume provisioning process that maintains this colossal data store, separately from CF. And Predix can request those services from Portworx using Open Service Broker (OSB).
“At the core [of Portworx] is a distributed-block, software-defined storage solution,” explained Portworx CTO Goutham Rou, also speaking with Alex Williams. Over the past decade-and-a-half, Rou remarked, adding 10 to 20 terabytes of storage to a network has evolved from architecting a SAN or a NAS over months or even years, to buying a server with 20TB already installed.
“In parallel, modern applications like Cassandra, MongoDB, or HDFS inherently like to scale out, and handle the scale-out and aggregation themselves,” Rou continued. “If you take a look at Cassandra, you would want to give it the illusion of direct-attached storage, or this notion of hyperconvergence. Our software architecture fundamentally enables people to run their applications hyperconverged — though it’s more conducive to applications like Cassandra, because you get that low-latency access. Your applications are running closer to where their storage lives.”
What’s more, the CTO said, Portworx is capable of working directly with Mesos, or whatever the scheduler happens to be, to ensure that data-driven applications remain orchestrated sensibly. Predix’ stack does include Cassandra; in fact, the Predix team created an OSB package for Mesosphere DC/OS, for provisioning a Service Broker-compatible version of Cassandra.
“Today, lots and lots of companies are running data-rich applications on DC/OS,” stated Mesosphere Chief Technology Officer Tobias Knaup [pictured left], during a MesosCon keynote session. Many of those applications, Knaup said, especially in the IoT space, are running on what it’s calling the “SMACK Stack:” Spark, Mesos, Akka, Cassandra, and Kafka. “We pulled some numbers: More than 50 percent of all DC/OS clusters are running some of these frameworks. So it truly is a platform for running data-driven applications.”
Portworx’ stack also includes Redis, ElasticSearch, an open source anomaly detection package, and RabbitMQ as a message broker separate from OSB.
“The biggest challenge that we as platform operators and platform architects [face],” Nagarajan told The New Stack, “is, when we provide a platform for our end users, we want to make sure that platform is highly available. The way Mesos gives you a unified fabric of compute, Portworx provides a unified fabric of storage. We can then have end users not worry about how to consume an HA application.”
If it’s not exactly a “marriage” of statefulness and statelessness, it is, for now, a workable cohabitation. Before too long, the company that first turned on the light bulb may recognize the full scope of the idea is set in motion.
Feature image: Pictured above, from left to right: The New Stack’s Alex Williams; Venkatesh Sivasubramanian, Data Platform Lead, GE Previx; Balajee Nagarajan, Director of Software Engineering, GE Predix; Goutham Rou, CTO and co-founder, Portworx