Cloud Native / Kubernetes / Monitoring / Sponsored

Pixie Brings In-Cluster Kubernetes Debugging to CNCF

6 May 2021 9:32am, by

In December, observability services provider New Relic acquired Pixie Labs, which offered a Kubernetes in-cluster debugging platform. Now the company has opened sourced Pixie under an Apache 2.0 license and has started the process of donating the technology to the Cloud Native Computing Foundation (as well as sign on to CNCF as a member).

The company’s ambition for the project is for it to become “the default observability platform, similar to what Prometheus is for metrics and Kubernetes is for container orchestration,” said Ishan Mukherjee, Pixie co-founder and chief product officer, told The New Stack.

In a keynote talk with Priyanka Sharma, general manager of the CNCF, during this week’s KubeCon + CloudNativCon EU, Pixie co-founder and CEO Zain Asgar noted how the project will help to extend observability to help process massive amounts of data with machine learning and in other ways “that hasn’t really happened yet.”

The project has a lot of momentum.

On Pixie’s board is Kelsey Hightower, principal developer advocate at Google Cloud; Pixie’s Mukherjee and Jaana Dogan, a principal engineer for Amazon Web Services.

Pixie began as a project in 2018 after Mukherjee and Asgar left their roles at Apple and Google. “The core energy for New Relic behind the project has been this idea of empowering data nerds — that is how it really came about,” Mukherjee said.

A major goal of Pixie’s development team has been to “provide instant, baseline visibility into any application running on Kubernetes,” Asgar said.

“We wanted our users to get an accurate picture of their system in seconds, not months. By running a single CLI command, Pixie will automatically start collecting things like full-body requests, metrics, network data,” Asgar said.

With Pixie, Asgar described how users can view their entire cluster, notice HTTP latency in a particular pod “and drill down into the stack trace for that pod to see what’s causing the delay — All of this is collected and made accessible to the user without any code changes,” Asgar said.

The aspect of not having to change code draws from the auto-instrumentation powered by eBPF, which allows programs to run in the Linux kernel without changing source code as well.

CNCF Boost

The CNCF is expected to lend more visibility to the platform as an open source project, while offering cloud native project support, which is critical for an observability platform.

In addition to the well-established benefits that open sourcing projects offer, Asgar described how the decision to join the CNCF represented a “step further.”

“Developers and organizations today are savvy enough to understand that not all open source is created equal. The most impactful open source tools maintain the integrity and spirit of open source software, even when they are maintained by a company,” Asgar told The New Stack. “We didn’t want to be a ‘cosmetically’ open source software tool deeply tied to a paid product, so we started the process to contribute Pixie to CNCF.”

New Relic is also “going all-in on” the CNCF OpenTelemetry observability framework, Asgar said, adding that Pixie would be OpenTelemetry compliant.

New Relic “is standardizing around OpenTelemetry in order to make New Relic products more interoperable with the other tools that our users love,” Asgar said. To this end, New Relic has also open sourced the agents, integrations, SDKs, CLIs and custom visualizations on its New Relic One platform, “making it easier for engineers to access and build custom instrumentation,” Asgar said.

Pixie’s main capabilities include:

  • Auto-Instrumentation: Pixie will automatically begin collecting requests, metrics and network data. eBPF powers the auto-instrumentation, as mentioned above, which Asgar noted was “popularized by the pioneering work of Brendan Gregg, a kernel and performance engineer at Netflix.”
  • Fully Scriptable Control: Inspired by the team’s experience as developers,  Pixie’s goal was to make observability and troubleshooting to be much more code-based. “This has been a big win for those who want to automate more of their workflows. Our query language and API makes it easy to run analytics on Pixie data and export the results,” Asgar said. “We’ve seen really clever Slackbots running queries on Pixie, and we’re excited to see what else users come up with.”
  • In-Cluster Edge Compute: “The purpose of this platform is to identify and troubleshoot problems in production applications,” Asgar said. “This sounds great on paper, and a lot of other projects share this goal, but there are a few barriers here historically.”

Asgar described, for example, the difficulties associated with collecting the right data, which he said is “highly non-trivial.” While system metrics are fairly accessible, accessing pertinent data from the application layer, such as database calls or HTTP requests, code-level changes were typically required to instrument the system. “This can be a major pain when you are trying to troubleshoot a problem in production, only to discover there was a gap in your instrumentation. Data analysis is also a challenge,” Asgar said.

As a solution, one Pixie developers’ main goals has been “to provide instant, baseline visibility into any application running on Kubernetes,” Asgar said.

A newsletter digest of the week’s most important stories & analyses.