The nonprofit Linux Foundation in conjunction with Harvard’s Lab for Innovation Science recently published Census II of Free and Open Source Software (FOSS) — Application Libraries. Compiled from anonymized usage data based on scans of codebases at thousands of companies contributed by the Synopsys Cybersecurity Research Center (CyRC), Snyk and FOSSA, this report identifies more than 1,000 of the most widely deployed open source application libraries.
The report’s authors noted that because FOSS is produced in a decentralized and distributed manner, determining its health, economic value and security can be a challenge. Because software components are packaged, and versions are identified and cataloged in so many unstandardized ways, the report has organized them into eight Top 500 lists.
As Mike McGuire, security solutions manager with the Synopsys Software Integrity Group says, packages and versions are a bit like the different model, year and trim of a car.
“If I told you I drive a Toyota Camry, you still don’t know exactly what I drive. Is it the 1999 version or the 2022 version? It’s important to know this when ordering parts, getting service, tracking recalls, etc,” he said.
While the Census II report aims to “inform actions to sustain the long-term security and health of FOSS,” and represents “our best estimate of which FOSS packages are the most widely used by different applications,” the authors caution that the report does not reflect the risk profiles of the software. “There are many indicators that could be used to suggest risk and different organizations may weight factors differently,” the authors wrote.
McGuire agrees with the caveat. Widely used is not the same as critical.
“Tons of apps can be using a specific Java GUI framework, making it very popular, but it may not serve as a critical part of the software should something happen to it.”
He added that what is considered critical is going to be unique to each organization based on how their apps are built. Still, measuring risk profiles is easier to do once the most widely used software is identified, the Census II authors wrote.
As the industry moves toward widespread standardization and adoption of software bills of materials (SBOMs), the report defines several roadblocks to improving how software is identified, cataloged and maintained. These challenges include:
- The need for a standardized naming schema for software components so that application libraries can be uniquely identified. Without this “organizations will remain categorically unable to communicate with each other on the large scale — particularly, the global scale — necessary to share such information.”
- The complexities associated with package versioning. The team encountered an unexpected problem — companies were “maintaining internal versions of a package and were not contributing their changes back to the official repository. In one instance, they observed version 2.87 of a package multiple times, but the official repository only went up to version 2.26.” That means that if an SBOM “can’t distinguish between a ‘main’ version and a variant … it will be difficult for the purchasers of such software to know if they are vulnerable to newly discovered vulnerabilities.”
The report also identified several issues that affect the long-term security of FOSS. These include:
- Much of the most widely used FOSS is developed by only a handful of contributors — results in one dataset show that 136 developers were responsible for more than 80% of the lines of code added to the top 50 packages. The danger here, as reported in the 2021 OSSRA report is that “As an open source project grows in popularity — with no corresponding growth in people maintaining the project — the consequence is often developer burnout, and many open source projects are abandoned.” And if projects are abandoned, that means bugs don’t get fixed.
- The increasing importance of individual developer account security. Individual accounts generally aren’t as well-protected as organizational ones. That, the authors wrote, means “changes to code under the control of these individual developer accounts are significantly easier to make and to make without detection. Further, a related issue could occur if the individual developer went on a long hiatus, or was hit by the proverbial bus, preventing updates to the code from occurring.” That’s not the only risk. Others include solo developers removing or deleting their projects, breaking hundreds to millions of packages that depend on it.
- The persistence of legacy software in the open source space. We’ve all heard of companies declaring that they are ending support for older versions of operating systems or applications. But that doesn’t mean everybody stops using those older versions for any number of reasons. “Many organizations will find it difficult to justify switching to different packages since there are financial and time-related costs for switching to new software when there is no guarantee of an added benefit,” the authors wrote. Indeed, the OSSRA report found that 85% of the codebases examined in 2020 had open source dependencies that were more than four years out of date, even though there were newer versions available — sometimes many newer versions. But that can be dangerous. One of the reasons for newer versions is to fix bugs in the older versions. And you can be sure that hackers are looking for those still using the older versions.
While it is not prescriptive, Census II does point to the need for organizations and users to be more actively involved in FOSS development and not leave it solely to the small group of developers who have led the way thus far. As well, the report shows how important software composition analysis is to detect the persistence of legacy software in open source and the ongoing need for standardization in the SBOM space.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Census.