SPDX Could Help Organizations Better Manage Their Thickets of Open Source Licenses
As open source becomes more pervasive, companies are consuming products that have open source components. Today you literally can’t use any piece of software that doesn’t have any open source code in it, making it very complicated for companies to keep a tab on what they are consuming and stay compliant with open source licenses.
To help simplify matters is a new Linux Foundation project called Software Package Data Exchange. With SPDX, the Foundation hosts the project and owns the copyright on the specification and trademark assets. It’s an open community of volunteers and as such has people participating across a broad spectrum of companies, academia and other foundations.
“With over 2,000 different software licenses for software freely available on the internet, license proliferation is a major headache for software development organizations as well as for companies redistributing software packages in their products,” noted a paper written for SPDX Working Group by Phil Odence, vice president of Black Duck Software and Kate Stewart, former Canonical Ubuntu release manager and now Director of Strategic Programs at the Linux Foundation.
Open source and licenses can be a massive headache even for companies with massive resources at their disposal, given so many different licenses can be incompatible with each other. Just trying to get accurate and complete license information for anything can be a time-consuming headache especially on a large scale. At the same time, companies also need to better understand the issues around complying with open source licenses.
The SPDX Governance model is based on the Meritocratic Governance model like the one used by the Apache Software Foundation. There are three teams each with their own chairs and charter: Technical, Legal and Outreach. In addition, a Core team provides overall guidance and coordination.
The Technical committee maintains and publishes the SPDX Specification and Tools. It is co-chaired by Kate Stewart and Gary O’Neall. The actual team consists of volunteers from the SPDX community and is open to participation by anyone. There are a number of resources used in the day to day work of the team including a wiki, bug/feature tracking system and GitHub repositories. There are weekly conference calls. Contributions to the specification are made under the Linux Foundation guidelines.
SPDX provides the license information that companies need, in a standard format, in order to make those types of compliance decisions. SPDX uses a common machine and human readable language for conveying license compliance information that seeks to reduce redundant efforts and clarify developers licensing intentions.
Engaging with SPDX
SPDX offers a number of outputs that can be used by outside parties. Each one is valuable in its own way for helping with license identification and ultimately compliance:
- License List: The license list uses short identifiers to represent specific open source licenses. SPDX maintains a published page at a nondeprecating URL that has the short identifier and license text for open source licenses that have been reviewed and published by the legal team.
- License Identifiers in Source: The use of license identifiers in source documents greatly aids in the reliability and efficiency of automatically parsing source files to determine a license.
- SPDX Documents: SPDX Documents are a summary of the relevant licensing and copyright information in a standard format, that permits sharing between a project and users, supply chain members, etc. At a minimum, it contains the copyright and license information for all the files in a Package (something you are conveying).
There is a standard format for an SPDX document but there are many different tools, both open source and commercial, which can work with them and automate routine steps in the license compliance process. Depending on the level of analysis that goes into the SPDX document (and thus the content it was created from) there may be some manual steps as well, i.e. annotating the SPDX document with reviewer comments, etc.
SPDX Documents are composed of one or more sections. Some of these sections are required, while others are optional. Version 2.1 of the SPDX Specification list the following sections:
- Document Creation Information: Denotes who created the document, how it was created and other useful information related to its creation.
- Package Information: This section provides information about the “package.” A package can be one or more files. These files could be one or more files of any type including but not limited to source, documents, binaries, and so forth. The package information contains the originator, where it was sourced from, a download URL, a checksum and so forth. It also contains summary licensing for the package.
- File Information: This is information about a specific file. It can contain the file copyrights found in the file (if any), the license of the file, a checksum for the file (to make sure the file hasn’t changed), file contributors and so forth.
- Snippet Information: Snippet information can be used to define different licensing of specific bytes or lines within files under a different license. This would cover a scenario such as the following: Someone copies in code from a GPL-2.0 file into an Apache-2.0 licensed file.
- Other Licensing Information: Other licensing information provides a way to describe licenses that are not on the SPDX License List. You can create a local (to the SPDX document) identifier for the license and place the license text itself in the document as a well and then reference it for files just like you would a license from the license list.
- Relationships: Relationships were introduced in the 2.0 specification and are a very powerful way of expressing how SPDX documents relate to one another. For instance, a binary file was GENERATED\_FROM a source package.
- Annotations: Annotations are comments made by people on various entities and elements within the document. For example, someone reviewing the document may make an annotation about a specific file and its license. Annotations are useful for reviews of SPDX documents and for conveying specific information about the package, file, creation, license, file(s), etc.
Who is SPDX for?
SPDX is not restricted to individuals or organizations, it caters to both. The organization is encouraging open source developers to adopt using its License Identifiers. The Das U-Boot project has been an early adopter of this approach.
There is a very clear benefit of it even for lone wolf developers. If people can unambiguously determine the license of a file, compliance (and thus following their wishes) become less error prone and more reliable. While looking through four or five files is no one’s idea of a “big deal,” try looking through 10,000 of them when you are trying to actually build something.
There are many big companies that work with SPDX, including ARM, Qualcomm, Samsung, Siemens, TI, WindRiver, and others. Different companies use SPDX differently. Some use just the license identifiers. some use the structure of information for their own programs, some use the material to create documents.
Keep in mind, however, that using SPDX doesn’t mean it offers complete protection from any violation. SPDX is just meant to summarize and report the facts about licensing in a package. The project makes no judgment on whether something is a violation or not.