As the keepers of monitoring and debugging software start using these new kernel calls, some of which have been added to the Linux kernel over the last two years, they will be able to offer much more nuanced, and easier to deploy, system performance tools, noted Brendan Gregg, a Netflix performance systems engineer and author of DTrace Tools, in a presentation at the USENIX LISA 2016 conference, taking place this week in Boston.
With this work, even the much-heralded DTrace, the dynamic tracing tool that is one of the major features of Sun Microsystems’ (and now Oracle’s) Unix variant Solaris, can fully be ported to Linux, something long-dreamed-of by Linux kernel hackers.
In years past, the original Unix performance system software vendors, often proprietary, offered incomplete system metrics. “Those of us doing performance analysis got very good at reading tea leaves,” Gregg said.
“With dynamic tracing, we get crystal ball observability,” Gregg said. “I can fill in all the gaps and missing pieces.”
Linux already offers a number of tracing tools. Static tracepoints provide a way for kernel developers to add probes into their code to help with debugging. But these can’t be used everywhere. Performance monitoring counters (PMCs) are good for doing low-level traces for CPUs. There have also been a number of dynamic tracing tools — kprobes, ftrace, SystemTap being the most notable — though their potential applications have been limited, and most were designed to operate externally from the kernel, which brings performance and usability penalties.
Dynamic tracing may also prove to be a boon for monitoring visualization and GUI work. Here, Gregg himself has pioneered a new visualization technique for charting latencies, called heat maps, and Netflix plans to use this tracing capability to build a self-service heat map GUI that engineers can use to quickly assess latency issues.
These tools will provide many ways to understand why a thread gets blocked. “We can officially trace the scheduler, whereas before there was a lot of overhead to that,” Gregg said.
Go to the Source: BPF
The secret to these new capabilities are largely derived from technology built in the early 1990s, the BSD Packet Filtering (BPF), which was designed to make TCPdump filter process large numbers of packets efficiently.
It is BPF that “provides that final piece of programmatic traceability,” he said. “The kernel can now do everything we want it to do.”
BPF actually runs in a tiny virtual machine in the kernel and has been refined for more general duties over the past decade. “If you are running Linux you will be getting BPF,” Gregg said. Bits of the functionality have been added in over the 4.x line of Linux and BPF will be fully present in Linux 4.9.
The BPF work has been led by Facebook’s Alexei Starovoitov, could also be used for intrusion detection, virtual networking, and programmatic tracing.
There are a number of ways programmers can interface with BPF. One is through the BPF assembly, which is not recommended, given how hard it would be to understand that language. Fortunately, there is also a front-end for the C programming language (C/BPF). Another option is the BPF Compiler Collection (BCC), which provides s Python and Lua front-ends.
Given its complexity, BPF wouldn’t be the first tool someone would use to analyze performance, it does provide much functionality not found elsewhere or provides functionality with much less overhead. Commands include opensnoop (shows all the open files in a system), biolatency (For tracing block device I/O), and runqlat (for measuring run queue latency).
Gregg offered an example of tracing TCP retransmits with tcpretrans command. Before dynamic tracing, captured packets would have to be written to a file for analysis, via tcpdump. With tcpretrans, a tracer can be linked directly to the tcp_retransmit_skb kernel function.
“I get the feeling a lot of people will be using BPF from the command line,” Gregg said. Gregg’s presentation slides are here: