IBM sponsored this post.
Security and integrity of the software supply chain is one of the fundamental requirements in the overall assessment of cybersecurity. The first step in securing the software supply chain is the ability to provide a complete, accurate and auditable record of every dependency baked into building a deliverable software product, generally referred to as a software bill of materials (SBOM).
As noted from a report from National Telecommunications and Information Administration (NTIA), “Without SBOM, a lack of transparency into the contributors, composition and functionality of software systems contributes substantially to cybersecurity risks and increases costs of development, procurement and maintenance.”
There are already a number of open source and commercial tools being made available to address the need, and well-defined specification standards being established. Some software vendors and developers have started incorporating SBOM generation and distribution as a part of their software delivery pipeline. So, what’s new that we are talking about in this article?
SBOM-generation techniques today are largely limited to discovering dependencies that are managed and able to be queried through package managers. For instance,
pip list or
dpkg -l or the ones that are explicitly recorded in some package manifests, like
Developers, on the other hand, are not limited to bringing in dependencies only through package managers. In some cases, required software dependencies are not available through package managers. For instance, some software distributions are available as pre-compiled binary that developers can simply
wget, while others are available as raw code in
tar.gz that developers could
make && make install.
This is a critical gap in ensuring completeness of our SBOM. And we couldn’t find any reliable open source tool to address this, so we decided to build one — and that’s orion.
First things first, in orion we are not trying to duplicate existing SBOM-generation tools. So orion does not discover any package manager dependencies. It discovers dependencies installed through modalities outside package managers. So it’s complementary to existing SBOM-generation tools. Also, at the moment, orion is focused on microservice application build patterns only.
With that said, now let’s look more closely into what orion does.
Orion is available as a CLI that can be executed locally or in any CI pipelines, as follows:
$ sudo orion discover -d orion-test-app/Dockerfile -f orion-result.spdx -i icr.io/gitsecure/orion-test:1.0 -n orion-test-app
For microservices, a deliverable software product is commonly a container image built through a recipe defined in a Dockerfile. The Dockerfile allows developers to express different patterns and strategies for building their applications.
Therefore, orion starts by scanning the Dockerfile. During the scan, it parses commands like
tar, etc., that indicate inclusion of some third-party dependencies and creates an intermediary “trace” object.
The trace essentially holds the provenance information for each dependency — for instance, the mapping from download URL to its untar location in the image. At this point, we have a record of all the software dependencies. But we are missing another very critical detail for these dependencies required for SBOM: the unique identifiable key.
In some cases, as shown in Fig.1, these dependencies list release versions in their download URL or file name. But it is not a reliable or consistent technique. Therefore, for all dependencies we hash their file contents that serve as the key.
Next, when should we compute the key? One option would be to download these dependencies pre-build during trace computation and measure the key from the result file(s).
Then again, in some cases developers download dependencies with
stable release tags, which could resolve differently during actual build. Therefore, orion requires reference to the final built image, which it uses to discover the final unique key. Orion finally emits the report in SPDX output format.
Currently, there’s one other important detail yet to be fully supported: the discovery of licenses for these dependencies. This is again because there is no standard discovery technique that can be employed to cover the varied ways in which these dependencies can be hosted, although we are slowly adding support for a few hosting platforms.
It’s worth mentioning one approach we were considering at the beginning: “Can we just build this SBOM from an image, without Dockerfile?” It seemed feasible, since images are typically layered with a new independent layer for every operation from the Dockerfile.
But as we surveyed different build patterns, we realized the limitations of this approach, especially when developers follow multistage builds or squash all layers in the image for space efficiency.
Another disadvantage of such an approach is the missing source information of the software composing each layer. The source information can be found easily from the Dockerfile. So we decided to take a multipronged approach, with Dockerfile scanning for trace collection and image scanning for final artifact ID mapping.
Our mission with this project is to enable complete and accurate SBOM generation for microservice applications. Orion, in its current state, is just our first step in this direction. So we welcome everyone’s feedback and comments to make our software build more transparent and accountable.
Photo by Thirdman from Pexels.