What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
CI/CD / DevOps / Security

Orion: Go Beyond Package Manager Discovery for Your SBOM

Orion discovers dependencies installed through modalities outside package managers. Today it is focused on microservice application build patterns only.
Dec 10th, 2021 6:50am by and
Featued image for: Orion: Go Beyond Package Manager Discovery for Your SBOM
Photo by Thirdman from Pexels.

Shripad Nadgowda
Shripad is a senior technical staff member at IBM Research. He is passionate about driving research innovations that bring differentiating capabilities to the cloud. His current area of research includes DevSecOps, developer tools and basically everything related to container security.

Security and integrity of the software supply chain is one of the fundamental requirements in the overall assessment of cybersecurity. The first step in securing the software supply chain is the ability to provide a complete, accurate and auditable record of every dependency baked into building a deliverable software product, generally referred to as a software bill of materials (SBOM).

As noted from a report from National Telecommunications and Information Administration (NTIA), “Without SBOM, a lack of transparency into the contributors, composition and functionality of software systems contributes substantially to cybersecurity risks and increases costs of development, procurement and maintenance.”

There are already a number of open source and commercial tools being made available to address the need, and well-defined specification standards being established. Some software vendors and developers have started incorporating SBOM generation and distribution as a part of their software delivery pipeline. So, what’s new that we are talking about in this article?

SBOM-generation techniques today are largely limited to discovering dependencies that are managed and able to be queried through package managers. For instance, pip list or dpkg -l or the ones that are explicitly recorded in some package manifests, like package-lock.json.

Developers, on the other hand, are not limited to bringing in dependencies only through package managers. In some cases, required software dependencies are not available through package managers. For instance, some software distributions are available as pre-compiled binary that developers can simply wget, while others are available as raw code in tar.gz that developers could make && make install.

Laura Luan
Laura is a software engineer at IBM Research on computing resource management and automation. Her current focus area is cloud security compliance, specifically the tooling for automated software inventory, enforcement of best practices and integration in DevSecOps process.

This is a critical gap in ensuring completeness of our SBOM. And we couldn’t find any reliable open source tool to address this, so we decided to build one — and that’s orion.

First things first, in orion we are not trying to duplicate existing SBOM-generation tools. So orion does not discover any package manager dependencies. It discovers dependencies installed through modalities outside package managers. So it’s complementary to existing SBOM-generation tools. Also, at the moment, orion is focused on microservice application build patterns only.

With that said, now let’s look more closely into what orion does.

Orion is available as a CLI that can be executed locally or in any CI pipelines, as follows:

For microservices, a deliverable software product is commonly a container image built through a recipe defined in a Dockerfile. The Dockerfile allows developers to express different patterns and strategies for building their applications.

Therefore, orion starts by scanning the Dockerfile. During the scan, it parses commands like wget, curl, git clone, tar, etc., that indicate inclusion of some third-party dependencies and creates an intermediary “trace” object.

The trace essentially holds the provenance information for each dependency — for instance, the mapping from download URL to its untar location in the image. At this point, we have a record of all the software dependencies. But we are missing another very critical detail for these dependencies required for SBOM: the unique identifiable key.

Sample Dockerfile Pattern

Fig. 1: Sample Dockerfile Pattern

In some cases, as shown in Fig.1, these dependencies list release versions in their download URL or file name. But it is not a reliable or consistent technique. Therefore, for all dependencies we hash their file contents that serve as the key.

Next, when should we compute the key? One option would be to download these dependencies pre-build during trace computation and measure the key from the result file(s).

Then again, in some cases developers download dependencies with latest or stable release tags, which could resolve differently during actual build. Therefore, orion requires reference to the final built image, which it uses to discover the final unique key. Orion finally emits the report in SPDX output format.

Currently, there’s one other important detail yet to be fully supported: the discovery of licenses for these dependencies. This is again because there is no standard discovery technique that can be employed to cover the varied ways in which these dependencies can be hosted, although we are slowly adding support for a few hosting platforms.

It’s worth mentioning one approach we were considering at the beginning:  “Can we just build this SBOM from an image, without Dockerfile?” It seemed feasible, since images are typically layered with a new independent layer for every operation from the Dockerfile.

But as we surveyed different build patterns, we realized the limitations of this approach, especially when developers follow multistage builds or squash all layers in the image for space efficiency.

Another disadvantage of such an approach is the missing source information of the software composing each layer. The source information can be found easily from the Dockerfile. So we decided to take a multipronged approach, with Dockerfile scanning for trace collection and image scanning for final artifact ID mapping.

Our mission with this project is to enable complete and accurate SBOM generation for microservice applications. Orion, in its current state, is just our first step in this direction. So we welcome everyone’s feedback and comments to make our software build more transparent and accountable.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.