Security Metrics that Actually Matter in a DevOps World
Capsule8 sponsored this post.
“DevSecOps” has recently entered the security lexicon to describe a hybrid of DevOps and security practices — and as we will see below, for good reason.
Most commonly, DevSecOps focuses on the narrow interpretation of DevOps as automation of the build/delivery pipeline. Thus, the typical notion of DevSecOps focuses on the software development lifecycle, attempting to put testing in as early as possible into the design process to protect the integrity and performance of the system at large.
A broader interpretation would see DevSecOps’ role as a mindset and practice. In this way it extends beyond just security automation and manifests itself as a culture that produces valuable results for the business.
For DevSecOps, this raises the question: If DevOps prioritizes software delivery performance, what should security prioritize to support this?
In this post, we describe which specific security metrics you need to help, instead of slowing down, your organization’s business processes.
Four Foundational Principles for Measuring Security
- Measure at the global level, not team level. The ultimate goal of the security team — or any team for that matter — is to help the business achieve its goals. That’s why it’s important to have security metrics at the organizational level — to avoid siloed thinking and to always prioritize the needs of the business over the team;
- Measure outcomes versus outputs. Measuring sheer work product is insufficient because it’s not tied to tangible outcomes. For example, measuring the number of hours worked or tests performed isn’t necessarily indicative of progress. What matters is whether those hours and tests reduce the attack surface or the incidence of malicious events;
- Don’t focus on static indicators to reach some mythical “maturity threshold.” Instead, prioritize building resilience over an abstract notion of “maturity.” This means instead of having a “cover my bases” mentality of deploying numerous incremental solutions, the objective of security is to increase the entire organization’s ability to respond and rebound from inevitable vulnerabilities and attacks;
- Don’t miss the forest for the trees. If you focus too much on individual security components, you’ll lose sight of the impact of those components on the overall system.
Which Security Metrics Actually Matter?
Before we dive into our list, it’s important to acknowledge that most security metrics are vendor-biased, particularly those espousing DevSecOps-specific metrics. An unfortunate byproduct of the security testing market is that we rarely see metrics outside of individual security products — for the simple reason that vendors won’t sell you on metrics that their own products can’t measure.
But make no mistake: you are what you measure. That’s why businesses should be careful about choosing the right metrics that fit with their security goals — regardless of whether their businesses adopt DevOps principles or not. Because if you optimize for a narrow goal, you’ll end up with a narrow program.
Three High-Impact Security Metrics
Broadly speaking, security metrics that support software delivery performance can be grouped into three main categories:
- Deployment metrics measure the health of the deployment process and provide leading indicators of application stability.
Examples of deployment metrics: time-to-deploy, deployment frequency, deployment success/failure, time spent fixing failed releases, and environment configuration drift.
Elite performers in this category can deploy on demand;
- Lead time metrics measure the capacity of the organization to respond to change and deliver business value (i.e. the time it takes to design and deliver requested security features).
Examples of lead time metrics: individual productivity/velocity, rework time, cycle time, time-to-value trends.
Elite performers in this category typically have average lead times <1 hour;
- Mean time to repair (MTTR) metrics measure how quickly threats can be remediated and services restored. The ability to remediate security instances quickly is highly correlated with engineering performance.
List of MTTR metrics: time to triage, time to investigate, time to remediate;
Elite performers in this category have a MTTR of less than 1 hour.
Don’t Try to Measure Failure
Mean time to failure (MTTF) is one metric security teams should not measure. Failure is inevitable, and to have a metric that incentivizes failure avoidance is unrealistic at best, counterproductive at worst. It takes attention away from the metrics that actually help remediate threats and build resiliency within the organization.
Instead, Measure Tradeoffs…
Improved security almost always comes with tradeoffs — whether that’s more friction or higher costs in time or money.
Here’s a basic example: the reduction in the number of security fixes per new release may be counterbalanced by an increase in employee time spent using security tools. And when you look beyond the security team, the newly added security feature may be outsourcing pain to other parts of the organization.
Having metrics in place to measure the impact of deploying a new security solution forces teams to seriously assess whether the benefits of a new release truly outweigh its costs.
Obviously, the types of metrics used to measure trade-offs would be highly dependent on individual circumstances. But teams can start by asking themselves a few basic questions:
- How much of your team’s time is spent on product maintenance versus problem-solving?
- Is there an increase in support tickets indicating confusion over new security policies?
- Are teams proactively coming to you, or is there evidence of avoidance?
…And Account for Systems Complexity
The complexity of your overall system brings a new level of risk that goes beyond any individual component. These systematic risks need to be taken into account within the organization’s security strategy.
For example, organizations can start by quantifying and balancing the short term and long term stresses to system:
- Short term stress: active incidents, last-minute reviews, audits, new vulnerabilities;
- Long term stress: employee turnover, number of product types/systems, number of security tools being used, budget cuts.
Security Metrics Are Only the Beginning
Finally, in our collective exuberance to do testing earlier in the SDLC, we must also assume failure during runtime as well in order to implement a resilient security strategy. Without understanding incidents in production, it’s difficult to create a continuous feedback loop to harden your container images — and certainly harder to continuously improve the security of non-microservices systems, too. Once appropriate security metrics are implemented for production, they can provide valuable feedback into improving pre-release testing. More importantly, they can even ensure security is enabling the business — rather than choking it.
Feature Image by Dominic Alberts from Pixabay.