The Linux Kernel as a Case Study on Rapid Development for Complex Software
Application development is a process, and the bigger and more complex the application, the more complicated the process. In the world of open source, there are a number of development projects that scale out beyond the scope of most. Projects like Kubernetes, Nextcloud, OpenShift, SUSE Manager, and (of course) the Linux kernel.
These projects fall well out of the purview of applications that can be coded and managed by a few users. Take, for instance, the Linux kernel. Since 2005, over 15,000 developers have worked on this project. Why is this? The Linux kernel is a very complicated ecosystem that includes more moving parts than most realize.
But the development process is not dictated only by the size of the project. It’s also affected by the need for rapid development. The Linux kernel powers a vast number of devices across the globe. Thousands upon thousands of businesses depend on the Linux kernel, so when something goes wrong with the kernel — such as when a vulnerability is discovered — the pace of development has only one speed: rapid.
The Challenges Facing Kernel Developers
Developing for the Linux kernel presents a number of serious challenges. Not only are you dealing with developers across the globe (working in different time zones, speaking different languages, and working at different paces), you also get a sense of isolation.
Consider this, the Linux kernel is the largest software project on the planet, with over 21 million lines of code. All of that code is not just developed under one banner. Instead, there are numerous areas of specialty.
“The kernel covers a large number of areas, from hardware enablement to scheduling algorithms,” Laura Abbott, principal kernel engineer for Red Hat, said. Each of these areas could easily be considered a project of its own. “When you work in a single area, you get familiar with not just the technical details but the process quirks for that area. If you end up needing to work in a different area of the kernel, it can feel like you are working in an entirely different project.”
That “area isolation,” according to Abbott, makes kernel development both challenging and interesting. “There’s always something else to learn and nobody can know every single aspect of the kernel,” Abbott said.
The development team from Nextcloud takes a completely different approach. When asking his team about rapid development, Jos Poortvliet, co-creator of Nextcloud, answered, “half the team told me we don’t do that. Just like we don’t do agile, etc. You know, for developers it is black and white. If you do it 90% you don’t do it because it is all or nothing.”
Poortvliet also challenged his team to list out the challenges that complex projects face. He received the following answers:
- Being uncertain about the solution you are pursuing, about your own code. Hard problems make you feel dumb.
- Conflicting interest. CEO wants X, marketing wants Y, sales wants Z. Engineers just want to clean up, to not drown in technical depth.
- Understanding client/customer requirements.
Tools of the Trade
Every developer has their favorite tool for a project. The Linux kernel is no different. In fact, having the right tool for such a complex project can mean the difference between getting work done and falling behind.
For Abbott, the most important tools are the ones that can find issues faster and help make sure what is being developed is correct.
“The kernel has a number of in-tree features to catch problems at runtime. lockdep is a feature that came out of the Real-Time kernel effort a number of years ago to catch locking problems such as deadlocks,” Abbott explains. Another feature is KASAN (KernelAddressSANitizer), which was developed to find incorrect memory usage at runtime. These features “help expose bugs immediately that might otherwise have taken thousands of hours of testing to expose.”
One tool of the trade that cannot be overlooked is documentation. Without solid, well-written documentation development on complex projects would frequently stall.
“I rely on the documentation contained in the kernel to explain parts of the code I might not be familiar with,” Abbott said. “Having the process documented, and having everyone follow the same process, helps to make things smoother for everyone.”
Patricia Gaughen, engineering director at Canonical, says for open source projects, “the most important development tool is git, which has become the de facto version control system for open source.” Why does she feel Git has become so important? “The biggest slowdowns in collaborative software development center around working with others, and git makes it much easier to iterate, share, and integrate changes in a collaborative environment.”
The Nextcloud developers offered a list of the tools they depend upon for the development of their particular complex project. The tools of their trade include:
- GitHub and its workflow, and many bots from there (alternatives such as Bitbucket and GitLab offer similar tools and workflows).
- Automated testing tools, e.g. Jenkins which is part of GitHub and its workflow.
- Occasionally meeting in person.
“I think, while it isn’t a tool, we benefit a lot from being so distributed,” Poorvliet added. “We have literally five engineers in offices, the rest work from home, all over Europe and even beyond. This forces discipline in communication. [You] can’t leave people who are off-site out, as everybody is off-site.”
Patch Submission Process
According to Abbott, one of the outcomes from the last kernel maintainer’s summit was a push to improve the overall kernel workflow. Regarding her own workflow, Abbott says, “I’m significantly less process-oriented in my personal workflow than many developers I know. As a distribution maintainer, I end up touching a large number of areas across the kernel which may not be familiar to me.”
Abbott’s process involves making sure a particular patch works via whatever test or use case she’s looking at. She makes sure to turn on relevant debugging options and runs relevant unit tests. As to the nuts and bolts of her submission process, Abbott says, “I still use git-send-email to send out patches and always make sure to give the patch one last manual review before I finally send it out.”
For Canonical, the patch submission process, according to Gaughen, goes something like this: “We export the patches from git, use the get_maintainer.pl script to identify the maintainers and mailing lists to which the patch should be sent, and then use git-send-email to send the patches.”
Open Source vs. Closed Source
Which type of development is best suited for the rapid development of complex software? Anyone that has done any type of development should know the answer to that question. In case there was any doubt, Abbott makes clear, “I think at this point it’s safe to say open source has won.”
Open source development has a lot of advantages over similar closed source projects. For instance, with open source development, you have everyone collaborating together on a single problem.
“No one person has the answer to every problem so by collaborating it’s easier to make progress on hard problems,” Abbott said. “Working in the open also lets others point out mistakes you might not have even thought of.”
From the Canonical point of view, Gaughen says, “There’s not necessarily anything inherent to open source which makes it better suited towards rapid development.” For Gaughen, it’s more about the development processes.
“While, I think, open source has been forced to refine these processes in order to scale to large numbers of developers working on a single project, there is no reason why the same processes could not be adopted for closed source projects,” she said.
The Nextcloud developers offered up a list of things the open source model offers:
- A far lower barrier to ‘fixing the problem at the source’ rather than working around it. According to Poortvliet, “We at Nc often reach beyond our codebase, discussing problems and submitting fixes or reviewing code from projects whose work we depend on. This obviously happens even more within our own codebase.” Poortvliet is aware this could happen in a closed source company, he says, “…due to the more ‘closed’ culture, it tends not to happen, e.g. the MS Office team I believe is known for having rewritten huge swaths of the Windows APIs. And that’s not even talking of code OUTSIDE their organization, but code within it.”
- Because you know your code is visible to everyone, there’s more social pressure on quality and readability. To this, Poorvliet mentions an instance that involved Linux creator, Linus Torvalds. “Linus famously said about comments in code that they should always describe the goal of the code, not what it does, because ‘many kernel developers speak better C than English and thus the C should be readable by itself and comments optional’.”
- Because of the relatively high rate of newcomers and turnover, projects are far more accustomed to and attuned to maintainability and getting people up to speed. The first means more splitting up. On this issue, Poortvliet says, “There’s this famous dependency graph of Linux vs. the Windows kernel — both are complex, but Linux is FAR more organized. This is in no small part because it is important that teams can work relatively independently as there are changes happening all over the code base all the time.” Poortvliet brings this issue around to an idea that could be at the very heart of closed vs. open source in projects. “In a large, complex project, interdependencies are hell and probably one of the major reasons why large software projects in the closed source world tend to go wrong. We have the same, we are very modular — ‘everything is an app’ is certainly an ecosystem play, but also a technical advantage, keeping our codebase much more modular.” He adds, “New people rarely if ever begin by contributing to ‘the core’ but almost always to apps — and in time, start to get familiar with the core. So the core should be small, giving us more meaningful contributions.”
- Tools and testing. Because of the constant influx of newcomers, Poortvliet says that means, “more automated testing and checks, and strict, collaborative and often enforced processes are in place with new stuff getting implemented swiftly because ‘it is cool’ (and we have to be cool to attract new people :D) and because somebody finds it fun to do and we don’t tell all ppl what to do.”
Breakdowns in the Development Pipeline
It is inevitable that a development pipeline will, at some point, break down. It happens in every project. But with the Linux kernel being what it is, you have a number of experienced maintainers who are responsible for ensuring that the workflow moves forward.
“Maintainers are only human and sometimes miss seeing a patch or a bug during code review, which can slow things down,” Abbott said.
One effort that is being considered is the application of automation into the workflow. According to Abbott, this would be done so maintainers can spend their time on other valuable work. She says specifically, “I’m very excited to see the increased focus on continuous integration and automation in the kernel. The kernel has had automation in various forms for a while. Intel has had their 0-day bot project which builds and tests patches that come in. This has been a key part of the kernel development process for a while now.”
But because we’re dealing with humans, development can become stalled if there’s a disagreement over a patch series. The solution, Abbott says, is communication. “Can the submitter articulate why they want the series to be included? Can the maintainer explain why they disagree with the approach taken? The development process stops completely when communication stops.”
From Gaughen’s perspective, “Regarding the kernel specifically, the breakdowns often stem from the fact that maintainers can be a single point of failure for getting patches into the subsystems they maintain.”
Given how large the kernel codebase is, this should be obvious. But it’s not just about the scope of the kernel, it’s also that maintainers have lives and other situations that make them unavailable. To that, she says, “When something happens that makes them unavailable for a period of time, progress in merging patches for that subsystem slows considerably. Sometimes other maintainers will help by merging some patches when this happens.”
But what about DevOps? This unique approach to development has been adopted by numerous enterprise-level companies around the globe. For those that aren’t familiar with the term, DevOps is a set of practices that works to combine both software development and IT operations with the goal of shortening the systems development lifecycle, as well as provide CI/CD.
But can DevOps be applied to the Linux kernel? Gaughen says a full-on DevOps process might not make sense for such a project. She does say, however, that “…elements of DevOps can be adopted for kernel development.” Gaughen offers up one possible element. “One example is the linux-next tree, which is a continuous integration environment for changes that will be going into the next development version of Linux. A lot of integration and testing happens there, which has helped the community find and fix bugs and integration issues more quickly.”
The Linux kernel is a massive project that requires rapid development from many developers and development teams. Over the years, this project has been a standard-bearer for how a massive project can function like a well-oiled machine, as well as a study in how to solve problems when they occur.