Meet Wolfi: the Linux Distro Designed to Shrink Your Supply Chain
It’s been more than 30 years since Linus Torvalds created the Linux kernel and distributed its first version. When we reflect back on the early days of free software adoption, massive credit of course should be given to Debian, FreeBSD and other open source FOSS distributions, which provided stability guarantees, came pre-packaged with common utilities, and spared users from having to manually install everything.
But the world is wildly different than it was in the 90s, and while certainly there are a number of examples of distros that have done great work around security–in many ways, modern software consumption patterns, such as the use of Docker to build software and the use of curl-pipe-bash commands to install software has created software supply chain security challenges. The world has largely moved on from the traditional FOSS distribution model through these workarounds, while losing the advantages of acquiring software through a curated distribution, such as the vulnerability management provided by distributions.
Let’s take a look at the evolution of software distributions, the areas where modern developer needs have outgrown some of the conventional wisdom at the cost of security, and take a closer look at Wolfi — a rolling-release Linux distro built around modularity and re-targetability, which provides primitives useful to meet the supply chain security requirements of modern users, while also providing the stability of multiple application version streams.
Software Consumption Looks a Lot Different Today
In the traditional software consumption model, systems administrators chose distros, installed them across their virtual machines, and got as much software as they could directly from those distros. Developers that needed extra stuff outside of those distros had to file tickets and wait for IT to service that request.
Containers and microservices immediately inverted this software consumption model and gave developers full control over how software was built, distributed and acquired.
Simultaneously, the explosion of FOSS created an effect where most software that developers were reaching for, were no longer supported by the world’s most reputable distros. From NPM to Maven to PyPI — the world’s most popular languages and libraries were releasing orders of magnitude more software packages and versions than the distros could keep up with.
The result was an effect where developers were now installing programs outside of their distro. The benefit was being able to acquire the latest and greatest software. But the downside was that this was all happening outside of the context of the curation and trust that are built-in when installing software directly from a distro.
There is a huge semantic difference between who you are getting software from, when you install outside of distros. When installing software manually, developers could get the latest software packages (and versions) without having to wait through the glacial pace of distros formally supporting them. However, with the popularity and ubiquity of FOSS, people got used to acquiring software without any of the old-world guarantees of trust built-in by the distro.
Mounting Security Pressure on Software Artifacts
Active exploitation of high-profile vulnerabilities like Log4j shined a new light on the importance of software supply chain security.
Where most organizations are obsessive about network security, Log4j illustrated this whole new class of exploits made possible because developer build systems and the artifacts they use were never given a trust mechanism. The modern hack was to come in through these doorways that were left open — find a dependency, an insecure library or other components — then once inside, pivot to all of the other transitive dependencies.
For the organizations now paying attention to software supply chain security, Log4j inspired a demand for better knowledge of the provenance of software artifacts. Companies started asking questions like where did these artifacts come from? Have they been tampered with through that chain of custody? Is this open source project still being maintained?
Many were dismayed to realize that their security scanners completely miss software installed outside of containers, and outside of the underlying distribution. When scanners can’t find these packages, they also can’t find their vulnerabilities.
And beyond not wanting to spend the next holiday season remediating the next Log4j, organizations are also feeling the heat from the looming regulatory changes. CISA’s Self-Attestation Common Form — combined with the ongoing White House cybersecurity decrees — makes it clear that the responsibility for insecure software is going to be applied to vendors in the future. The language is still murky, but the evolution of this legislative/regulatory effort is obvious. Already, FedRAMP (which dictates terms of compliance for selling software to the U.S. federal government) has very specific requirements for establishing the trust of software components, and the remediation of known vulnerabilities.
Most companies today realize that playing ostrich is not a strategy for managing the software support lifecycle. We all need to know all of the software artifacts that we are running in our environments, and we need to have confidence in the security posture of the software artifacts we are deploying.
What’s All This Cruft in My Distro?
There are some nice signals a company can look at today, to try to understand the security posture of open source projects, before they are brought into the software supply chain. The OpenSSF Scorecard project is one great resource, for example.
But the distro itself is often one of the biggest barriers you will face in understanding what security vulnerabilities you are already running.
On the one hand, there is a false positive problem presented by many of the world’s most popular distros – -it’s the security vulnerability equivalent of the “noisy pager” syndrome. Here for example is a scan of Red Hat’s Universal Base Image Image 9. If I run syft on this image, I see 203 packages in this distro, with 211 recorded CVEs, which is greater than the number of packages. Here — for example — I highlight the dmidecode component, which has a medium severity score:
Why are all of these monolithic distributions installing so many components in their base image that you don’t even need in the first place? For instance, dmidecode is a component for retrieving information from a server’s BIOS — it is not relevant inside a containerized environment.
The original purpose of including these tools in distributions was to provide a uniform experience for admins, and to support the most commonly requested utilities “out of the box.” Accordingly, the presence of all of these “kitchen sink” packages is a form of the accumulated technical debt resulting from the need of the distribution to serve every possible use case by default.
This technical debt is a huge cost to organizations using these distributions today. If we are constantly being confronted by CVEs on components that we neither use nor that the maintainer of the distribution has any intention of fixing, we encounter the “false negative” problem, which impacts our ability to triage actual vulnerabilities affecting our infrastructure.
These reported vulnerabilities not only create additional noise you don’t want for CVE remediation, but they also potentially create gadgets for that can be used in other exploitation chains, a class of attacks known as “living off the land.” For example, an attacker could exploit sudo, gain root inside the container, and then use these other tools to break out of the container. Now that the attacker is on the host and has fully compromised the system. By consuming packages you don’t actually use, you are assuming unknown liabilities.
Wolfi: Right-Sizing the Distro so Security Signals Make Sense
Wolfi is a distro whose name was inspired by the world’s smallest octopus. Launched one year ago, it emphasizes a modular design and packages more granular and minimal so that developers and security teams can reason with what they are running. Its design principles were created to address these basic disconnects between distros designed for an earlier era and the realities of today’s workloads running in the cloud and at the edge:
- Container images tend to lag behind upstream updates, resulting in users running images with known vulnerabilities.
- The common distributions used in container images also lag behind upstream versions, resulting in users installing packages manually or outside of package managers.
- Container images typically contain more software than they need to, resulting in an unnecessarily increased attack surface.
- Embedded scenarios, such as IoT devices running at the edge have small storage capacities and need smaller distributions as a result.
Where most distros are optimized for stability, broad compatibility and slow, purposeful change–Wolfi prioritizes fast updates and minimalism, using version streams to allow users to choose which updates they wish to consume and when.
Wolfi packages are built using melange, a flexible and secure build tool. It’s configured using YAML (the world runs on YAML), and runs build steps in containers (can run using bwrap, docker, Kubernetes). It produces .apks, which are signed with a private key. Wolfi hosts those packages on a CDN and a CLI (apk) manages the installation. Additionally, Chainguard uses apko, a tool that combines APK packages into an OCI base image, to declaratively build its Chainguard Images product from those packages.
Wolfi emphasizes frequent updates and automation. Where open source used to mean that you get a free copy of software forever, today’s reality is that you have to have a plan for constantly updating every piece of software, and that’s what Wolfi’s primitives optimize for. In Wolfi terms, the life of a package update looks like:
- Wolfibot monitors GitHub and release-monitoring for updates, and files them as PRs.
- Maintainers review and approve these updates, merging them into the distribution.
- Time between upstream release and availability in Wolfi can be measured in minutes, not days/weeks.
- Availability doesn’t mean you should roll it to production, you should test things out.
- Including less by default, enabled by the minimalist design of Wolfi, means you have less components to worry about when testing.
As Wolfi has hit the one-year anniversary of its release it’s attracted some great contributions, including 1,300 package configurations. In addition, we have built out partnerships with some of the most widely used container scanners such as Docker Scout, Grype, Snyk, Trivy, Wiz and Prisma Cloud.
Different distros have different priorities. Debian’s priority has been stability, broad compatibility, and slow and purposeful change — and these priorities made it successful for 30+ years. Alpine’s priorities are… faster. Alpine is security-focused, minimal, more frequent releases and patches — and in many ways, Alpine is a spiritual predecessor for Wolfi. All of these distros are fantastic public goods that inspired so much of the design rationalizations around Wolfi.