Simplify CI/CD with a General-Purpose Software Catalog
To automate deployment processes, CI/CD needs context: deployment configurations, build configurations, artifacts, version numbers, dependencies, environment variables, test results and more. This data doesn’t exist in one place — it is usually scattered across multiple systems and tools.
For example, deployment configurations might be stored in a separate YAML file, environment variables might be defined in a script or in the deployment manifest and version numbers might be tracked manually in a spreadsheet.
Too many sources of truth can lead to several problems, including increased complexity, metadata inconsistency, difficulties updating data and, most of all, an inability to apply automation. Software catalogs, the core of internal developer portals, can provide a solution.
First Step: A Software Catalog That Can Store CI/CD Data
The first step is creating a software catalog with the right data inside. It should be a general-purpose software catalog that allows adding data types with different properties and relationships, providing flexibility to enable everyone to bring their own data model to the catalog.
An internal developer portal is at the core of platform engineering. It presents developers with the self-service actions built as part of the platform and also with a software catalog.
This is where it gets interesting. From the developer experience point of view, the software catalog can be explained as a redacted, whitelisted data store that is curated to help developers overcome cognitive load (as an example, see how K8s data can be presented to developers).
But that isn’t the entire story. Actually, using the software catalog for CI/CD is very powerful. A software catalog can store data about builds, environments, cloud and a lot more. This type of software catalog can be highly beneficial to create a single source of truth for CI/CD context.
Platform engineering teams we’re talking to are actively realizing these benefits, especially with regards to CI/CD metadata. They use the software catalog as a single source of truth for CI/CD, and are also using the CI/CD data in the software catalog as part of their automated workflows.
By including relevant data about the clusters, environments, cloud regions and vendors in the software catalog, the CI/CD process can be more intelligent and automated, leading to better engineering. It decouples CI/CD from the contextual data it needs, separates controls and makes it easier to troubleshoot failures and broken pipelines.
Through the developer portal, these capabilities also help platform engineering teams provide developers with better visibility into the deployment process, as they can see the deployment status and any errors that occur in real time.
Next Step: Version Control and Security
Once the software catalog is set up, the benefits of one source of truth for CI/CD data can be taken even further when it’s also used for version control and security,
Tracking all the changes made to the metadata and configuration files improves the traceability of metadata changes over time. This can be useful for auditing purposes and for understanding the evolution of the deployment process.
Additionally, it drives better collaboration (with version and change tracking), faster issue resolution and the ability to quickly revert to a previous version and improved compliance. When CI/CD data is fragmented — think of scattered version history in git — it’s difficult to do this, but it’s much easier with the software catalog.
A software catalog usually ensures that only authorized users can access and modify the metadata, reducing the risk of unauthorized access, data breaches and other security incidents. Examples are a misconfiguration that results in making an S3 bucket public or exposing a service with personally identifiable information to the internet.
How It Works
The software catalog is essentially a centralized database that stores all the metadata related to the CI/CD process. It can be accessed and modified through a REST API, which enables CI/CD pipelines to interact with the metadata store programmatically. Data types, properties and relationships can be easily added when needed, since different organizations do DevOps differently.
What data should be accessed and stored? This depends on what we call your data model, meaning the properties and categories that are important within your pipelines. For instance:
- You can organize the catalog by different categories, each containing metadata related to a specific aspect of the CI/CD process. For example, there might be a category for deployment configurations, a category for environment variables and a category for version control.
- Within each category, there would be different metadata items or keys. For example, within the deployment configurations category, there might be metadata items for the deployment target, the deployment strategy and the deployment version.
The CI/CD pipelines can interact with the metadata store by using a REST API, specifying the category and metadata item they want to access. For example, to retrieve the deployment target for a specific application, the CI/CD pipelines might send a
GET request to the deployment configurations category, specifying the metadata item for the deployment target.
The Importance of Graph Databases for Software Catalogs
Graph databases come in handy for software catalogs. Since the different entities in the software catalog have complex relationships (for instance, a service is deployed on a namespace in a K8s cluster in a cloud account) and those relationships are important, you need the ability to natively query them. A graph database lets you do just that. This is particularly useful in the context of a CI/CD pipeline, where developers, DevOps and machines need to be able to quickly access information about how different parts of the system are related.
- Or let’s say we want to identify all the services that use a particular image version. Without a metadata store, you will need to manually search through various services’ configurations and documentation to find the ones that match. But with a graph database, we can create nodes for each service and link them to the image version they use. This allows us to quickly query the graph to find all the services that use the desired image version. We can start by querying the image version node and then traversing its relationships to the service nodes. We can even add additional information to the nodes, such as the environment the service is running in, the date it was last updated and any associated alerts or issues. This provides a comprehensive view of the entire system and allows us to easily track and manage our services.
- For example, let’s say we want to identify all services running in a particular region (for instance, if you’re operating a large-scale cloud platform, serving customers across different regions). Without a graph database, we would need to perform multiple queries across different data sources and try to piece together the information. However, with a graph database, we can do it in one query.
This ability to natively query complex relationships is critical in enabling developers and machines to perform impact analysis, manage configurations, run continuous tests and manage releases more effectively. This not only simplifies the CI/CD process, it also helps to ensure the overall stability and reliability of the system.
Software Catalogs Need to Be API-First
Now we need to think about how to easily get data into the software catalog. Easily ingesting data into the software catalog requires an API-first approach. This includes data from cloud providers,Kubernetes (for cluster data), git providers, Infrastructure- as-Code (IaC) tools such as Terraform or Crossplane and more.
An API-first approach also makes it easy to build integrations with other tools and systems, such as creating a dashboard with information about your infrastructure and applications. This can help you build a more comprehensive and useful metadata store that provides a holistic view of your infrastructure and applications.
The rise of platform engineering and the internal developer portals that are used as a core interface for developers also presents an opportunity to create a software catalog that can be useful not just for developers. A software catalog with CI/CD metadata can create a single source of truth, solve version and security issues, and allow automation of deployment processes and more. To see what a general-purpose software catalog can contain, go to Port’s live demo here.