Sternum Adds Observability to the Internet of Things
Once an IoT device is shipped, its manufacturer typically has very little information about what is going on with it in the field. This creates a black hole. A huge one, if you consider that there are more than 42 billion IoT devices, sensors and actuators installed and deployed across the globe.
Not a One-Size-Fits-All
Observability tools on the market today offer statistical fleet-level information that is of little use to a manufacturer trying to resolve in-field issues or reduce cyber risks for a specific device. On the device level, you can usually see where a device is (geo-IP), what it is connecting to (ports), and some data about firmware and resource utilization but little else.
What’s missing is in-depth information about usage. For instance, the trend of user interactions with the device, loop times of critical functionalities, drastic changes in function (e.g., rising temperature) that would hint at a developing issue, and more.
Also, while IoT devices are often lumped together, they are far from being uniform in design or function. For instance, take a moment to consider just how different a heart monitor is from an HVAC unit, a PLC, or any other device.
Different needs, different operational routines, different resource requirements… different everything. And understanding these nuances is crucial — not only from a business and innovation point of view but also because any malfunction in a single device could bring operations to a halt, leading to costly recalls or even threaten a life.
And so, for an observation solution to be universally effective, it needs to be flexible enough to support all of the different device-specific metrics and have the smarts to understand the day-to-day “lives” of the devices with respect to those metrics.
Taking the Temperature with Anomaly Detection
Sternum is a security and observability platform purpose-built for IoT solutions. On the security side, we offer agentless runtime protection. In short, a RASP (runtime application self-protection) is built specifically for embedded systems.
This patented technology, which we named EIV (Embedded Integrity Verification), blocks over 96% of threats, with less than 3% overhead, and — most important for this conversation — is universally compatible with any RTOS or Linux device.
The same design principle of universality is also at the core of our observability solution. Here the integration relies on a lightweight observability SDK that acts as an abstraction layer, which allows it to collect logs, metrics, events and traces from any device.
By using the SDK, users can granularly define their own traces and collect any type of data, as you can see below:
For example, with our SDK, an infusion pump manufacturer could choose to monitor the temperature of the device to ensure it never reaches dangerous levels, while an industrial controller could track pressure readings to ensure hydraulics are maintained.
Having this level of customization is great for flexibility. However, creating alerts for these traces still requires operators to manually set thresholds. This is complicated, time-consuming and rarely precise since it’s hard to plot in advance for every possible scenario. Moreover, such thresholds are never helpful for more complex multivariable issues.
This is why we invested a lot of resources into developing an AI-based learning engine that uses data from user-defined traces to create a profile of desired device behavior and to highlight important and unusual patterns.
The system starts collecting data as soon as a device is connected and, after a short learning period, it starts acting as an “extra set of eyes,” providing alerts about abnormal activities that would take human operators hours, or maybe days, to uncover — if at all.
Below you can see how this looks in our dashboard. The anomaly alerts (in yellow) are added to the event timeline to provide additional information and context for other fleet events.
As to what could trigger the alert, the engine detects:
- Communication pattern violations
- Abnormal presence of an event or several events together
- Absence of an event (e.g., an unfulfilled update request)
- Abnormal number of events (e.g., an abnormal number of update requests)
- Unusual value of a variable detection (e.g., an unfamiliar entity is connecting to the internal IPC)
- Atypical combination of values of several variables
- Sequence violation (e.g., command execution without authentication)
Each of these can become an anomaly event and, for each, we also provide a drill-down investigation view.
For instance, below you can see one of our investigations of a failed update event that was auto-detected by the AI engine. This is an actual event, spotted in the first few days of the feature rollout. If it was missed, this could have left the device exposed to security and operational issues.
And here is another real-life example that shows an automatic alert about a communication issue — a critical error in the making that would otherwise go unnoticed.
Learn to Understand
For the IoT market to continue to flourish, it must undergo standardization, and for that we need universal solutions that will work for every device type. And by work, I don’t mean only “integrate,” but also — and much more important — “understand.”
The ability to know how every device is supposed to operate is key to creating a solution that every IoT manufacturer could use.
The sheer scale of the problem calls for an automated solution. Although not without its challenges, I believe that anomaly detection is how we close the gap and help manufacturers understand how their devices work (and when they don’t), giving them not only granular control but also the usage insights and other information they need to create product differentiation, boost business and drive innovation.