This story is the first in a series that we will post that examine the role of the security in the open source software supply chain. Check back throughout the month for updates.
What happens when the maintainer of a popular open source framework or component dies, goes to prison or just gets fed up? Developers whose software depends on that repo might have time to prepare; there might be an official repo with a formal succession process, a fast but informal community fork. Or the code might just disappear — which could affect commercial tools using it too. And even if there are a warning and time to plan, it’s only helpful if developers are aware of the dependencies in their software and monitoring their status.
Death, prison terms and abrupt departures might be rare; software vulnerabilities aren’t. As Microsoft Azure Chief Technology Officer Mark Russinovich pointed out at the RSA Conference earlier this year, hugely popular open source packages have been compromised multiple times, meaning the organizations using them are compromised too.
The build script for the Webmin web portal admin toolkit was tampered with leaving it with a credential-stealing backdoor for a year, during which it was downloaded a million times. When the Bootstrap-sass web styling framework was compromised on npm and the attackers removed the older version, it took eight days until a safe version was available. A crypto miner was added to the Node.js Event-Stream package after a new maintainer volunteered to take over an old but still popular repo.
Even large, experienced technology organizations can make mistakes in securing their repo (Canonical’s GitHub account was compromised in 2019) or miss the update that fixes a newly-discovered vulnerability in a component. GitHub only discovered that it hadn’t updated a dependency in the code for its Mercurial import that sanitized branch names, allowing attackers to craft a branch name that could run code on the GitHub servers (albeit in a sandbox on an isolated network), through its bug bounty program.
These problems certainly aren’t unique to open source but as Russinovich noted, “open source is such a massive ecosystem that we need to go after it specifically and there are some specific implementation points in the supply chain that need to be addressed for open source.”
Viewing the open source that developers and operations teams consume as a supply chain makes it easier to think about where problems occur. “How do we make sure that only packages and source code we’ve got some assurance are trustworthy enter our supply chain? How do we ensure that what we know about that supply chain is reliable: when somebody says ‘this is trustworthy, I’ve done code reviews on it, I’ve got MFA in place for checking in source code’ that that is really, truly the case. And then what do we do when there’s a vulnerability, how to identify what’s affected by it and how do we roll back to a good version?”
Importing one package doesn’t add just one dependency; it also brings the upstream dependencies that package imports.
There are multiple steps the open source community can take, starting with good hygiene software development and package management, all the way up to defining a “bill of materials” for software the way we do for physical products to make dependency tracking more effective.
Embold is also free for open source use. Google’s OSS-Fuzz service, run in conjunction with the Linux Foundation’s Core Infrastructure Initiative, uses multiple fuzzing engines, checks open source projects written in C/C++, Rust and Go free and has already found 17,000 bugs in 250 projects.
Rather than leaving every maintainer to check one project at a time, GitHub is hoping its Security Lab (free for open source projects) and CodeQL will help remove vulnerabilities at scale across thousands of projects.
The GitHub Security lab bounty program has a new Bug Slayer category to reward researchers who write code queries maintainers can run that find not just a specific vulnerability but a whole class of vulnerabilities. That way, the CodeQL query doesn’t just fix current vulnerabilities at scale rather than one by one but will also stop new, similar vulnerabilities from being released in the future.
Also free for open source projects is Snyk, which will scan your source code repo and tell you if you have dependencies on. Now that GitHub owns npm, it’s going to be easier to check those dependencies; as GitHub CEO Nat Friedman pointed out when announcing the acquisition, “Looking further ahead, we’ll integrate GitHub and npm to improve the security of the open source software supply chain, and enable you to trace a change from a GitHub pull request to the npm package version that fixed it.”
But useful as automated dependency tools are for understanding what code a project is so developers can update and patch (and for automating that patching as part of source code and build management), the longer-term approach needs to be more systematic — because dependency chains are so deep in the open source world. Importing one package doesn’t add just one dependency; it also brings the upstream dependencies that package imports. Because many Node packages are snippets, installing one Node package means trusting, on average, 80 packages, and that number is going up over time.
“One interesting trend we’re seeing with this in these ecosystems is that once something gets popular, it gets even more popular,” Russinovich pointed out. For the five most referenced packages in the Node supply chain like inherits and Lodash, that goes up to 150,000 dependencies. You can check that for Node packages at npm.broofa.com, which also shows the relevant licenses, and how many maintainers are involved with those dependencies. Developers are taking a dependency not just on the quality of their code but also on how well they’re protecting their own build processes and release management from compromise.
To make it easier to detect when build servers are compromised, Microsoft is pushing the adoption of reproducible builds; builds of source code should be not just versioned but deterministic, with a record of the tools used and the steps needed to either reproduce or verify the build. “What reproducible builds give you is the ability to take some source code and somebody can rebuild it and know that given the compiler, given the artifacts that are pulled into it, that when they do a build the artifacts are going to have specific hashes,” he explained. “Then when somebody says ‘here’s a legitimate build, here’s the hashes for it’ they can verify that nothing’s been tampered with, that nothing got into the middle of that supply chain from build to release management, by rebuilding from the source.”
Most of Windows is now built with reproducible builds and Linux is moving towards reproducible. It can have some odd side-effects though; the timestamps in signed Windows binaries are no longer actual times because otherwise, they’d be different every time the build was run, so moving to reproducible builds can mean a lot of changes.
Developers also need to ensure they will have access to the packages they depend on if they go away suddenly, the way leftpad and sugar did, or even if the repo is unavailable because of networking issues. That also protects you if a package is compromised and the attacker removes the previous version to stop developers reverting to it.
For developers to protect their own code by understanding the source quality and reputation of the projects they depend on and to take steps to ensure availability, they need more than just a list of dependencies. The idea of the software supply chain and the software Bill of Materials is to have a flow of information like whether a project insists maintainers use MFA and offers reproducible builds with build artifacts. A build system can look at that signed Software Bill of Materials (SBOM) and block releases that have been tampered with so they don’t match the build artifacts or releases where static analysis or the dependency map reveal vulnerabilities.
Moving to a system where packages and builds include this much information so checks and blocks can be automated by policy with require massive changes to infrastructure, he cautioned but large software vendors like Microsoft and Google are already committed to this.
The Consortium for Information and Software Quality (CISQ) and the Object Management Group (OMG) have set up a working group to define an SBOM model for a comprehensive, hierarchical inventory of software projects that can be exchanged between development and orchestration systems. And there are two projects already underway.
The existing Software Package Data Exchange format is currently used to track licensing requirements in a software supply chain but it could grow into an open standard for communicating SBOM information, he suggested. “They’re now looking at extending this to support the requirements for software bill of materials, including strong signatures, identity and policies that can go along with it.”
The in-toto project from the Secure Systems Lab at New York University and the New Jersey Institute of Technology’s Cybersecurity Research Center created specifically to help secure the software supply chain with SBOMs by applying policies in build and release systems. In-toto monitors the commands issued inside an IDE like Visual Studio Code and the artifacts they create, including SHAH hashes, whether that’s cloning a repo or running a linter. Those artifacts can be verified against the hashes and used to check whether a package meets policies like having been run through static analysis or fuzzing or having dependencies patched before release.
Tools like this can give developers building software and organizations consuming open source confidence that they know what they’re getting, which is vital for keeping the open source ecosystem healthy.
“We need to understand the impact and severity of a vulnerability and what parts of the software supply chain are impacted by it. With an SBOM with strong naming, with hashes, with source attribution. In a world where everything is flowing through SBOMs, downstream in your code, you can look at the SBOMs that come with all the things you’ve pulled dependencies on and go all the way back upstream with automated tooling, right back up to the source files,” he suggested.
That will also help you see if other code is threatened by the same vulnerability; something that’s harder than it should be today. “When there’s a vulnerability in a package and that came from this piece of source code what other package is depending on that source code? That is something that is SBOMs ultimately can answer just through automated walking back through the chains of dependencies in this metadata.”
The Linux Foundation and Snyk are sponsors of The New Stack.