One of the primary benefits of building an application to run on Kubernetes is that when a process crashes, Kubernetes automatically spins a new one up again in its place, barely skipping a beat. At the same time, this means that problems can go unnoticed, and application performance management (APM) for microservices provider Instana has introduced a new feature that detects and reports abnormal process terminations to help surface crashes that may be causing issues that you were completely unaware of.
“It’s not like a monolith system where if something crashes, you’ve got to reboot some machines. This is about constant improvement of your systems over time as opposed to avoiding a critical breakdown,” said Chris Farrell, technical director at Instana in an interview with The New Stack. “One of the problems of an abnormally terminating process is it’s masked in the overall operations, but it doesn’t necessarily get masked in your customer experiences. While, yes, the system is designed to bring up a new process when one crashes and to fill in the gap, that will possibly create a spike in latency, maybe a load issue, and other types of problems.”
The new feature works on all Linux machines or containers running a Linux kernel 4.8 and above, and uses the Extended Berkeley Packet Filter (eBPF) to get some extra information to help it determine the root cause of these crashes. While Instana is already hooked into the kernel, Farrell explained, eBPF provides added information to determine whether a process terminated abnormally or not.
“We treat eBPF like any of the other tech pieces of the stack, so we actually put a sensor on it. When the crashes occur, we’re actually using eBPF to get information out like termination codes. The idea is to first look at any termination error code to determine if it was an expected termination or an abnormal termination,” said Farrell. “Using the error code, we then look at things like what actually was the cause of the crash? Was there a kill directive given? Or did it just happen? Did it run out of memory? Did it have an invalid opcode?”
According to a blog post, the eBPF sensor reports abnormal process terminations in “near real-time,” with it taking “one or two seconds between the crash of an important process, and the accurate diagnostics of its demise to be visible in Instana.” The sensor is the first part of Instana written in Rust, which they note “truly lives up to the awesome reputation it is rightfully accruing,” and works out of the box on any system where eBPF is present. Farrel emphasizes that eBPF is just one part of a larger picture, however.
“It’s not just the fact that it’s data from eBPF, but rather we take the data from eBPF, the data we have from Linux, and the data we have from the other sensors, which could include Kubernetes, Docker, Rancher or whatever the other pieces you have that are part of the infrastructure stack, and then the stack itself. If it’s an app server, we’re going to be in at the app server level,” said Farrell. “We have this nuance of tying all those pieces of information together to start to understand what actually caused this crash and where do you need to go to start looking to solve it.”
Once the abnormal process termination is identified, then you can be alerted and the issue can be handled like any other, said Farrell, with any remediation being handled by whatever system you have connected Instana to for that purpose. On the user end, there is nothing needed to get the new feature up and running, other than operating on a compatible version of Linux.
KubeCon + CloudNativeCon is a sponsor of The New Stack.
Feature image via Pixabay.