Shaping DevOps with the Best of ‘By Audit’ and ‘By Design’
DevOps, an integral part of the software development life cycle (SDLC), has traditionally been hampered by continuous reviews and reactive practices. Even modern practices implementing “continuous verification” are particularly flexible if not silent about the “hows” of the fix cycle, resulting in endless audit loops.
The industry’s response to this challenge is to shift toward more innovative strategies that are proactive rather than reactive. One emerging approach is platform engineering, a field that is becoming increasingly relevant in modern DevOps, offering an optimized and forward-thinking approach to software delivery. While platform engineering as a practice is developed centered around developer experience, “golden paths” within this practice pose an interesting promise to address the above flexibility of the fix cycles — by design.
In 2023, a company’s average expenditure on cloud services hovers around a staggering 80% of its total IT hosting expenditure. Delving into the market for essential cloud optimization tools, with prices starting at $2,499 per month for managing a cloud spend of $25,000 per month.
In our experience working with numerous companies, we have found that, on average, a company spending $100,000 on cloud services believes it is overspending by at least 25%. Despite this, these companies are willing to allocate approximately 10% of their total cloud expenditure on tools, personnel and processes to mitigate this overspending.
This means that 10% of a company’s total cloud budget is being spent solely on cost-optimization efforts! When you factor in the additional overheads associated with security, compliance and monitoring, these expenses increase significantly. Over the past decade, as more and more companies transitioned to using cloud services, these costs became a widespread concern.
I propose a different perspective: We should aim to address a substantial portion of these issues proactively, rather than relying on an iterative, reactive approach. By doing so, we can significantly reduce the inefficiencies associated with current practices, thereby decreasing the need for regular audits and optimization efforts.
I call the two approaches:
- Reactive approach: By audit
- Proactive approach: By design
As we do a deep dive into the above two approaches, there will be a close resemblance to “golden paths” and platform engineering in general.
The By-Audit Approach
When examining essential tech practices like cloud cost optimization, compliance, security, disaster recovery and monitoring, it’s evident that a reactive approach results in repeated efforts and inefficiencies. Let’s break these down:
- Current approach: Centralized teams breaking down cost reports daily/weekly and then finding/assigning/tracking tasks across engineering teams.
- Challenge: Effort intensive. Reduces some fat but doesn’t get you in shape.
- Current approach: Quarterly audits are conducted to identify teams responsible for noncompliance, and they are assigned the task of rectifying.
- Challenge: There’s no guarantee that the issue won’t resurface.
- Current approach: Regular reports are pulled to identify potential misconfigurations.
- Challenge: The number of false alerts is not just overwhelming, but it often leads to overlooking the root cause of these issues. At times, even the source of those misconfigurations is not known. For example, there are many ways that lead to the same misconfigurations.
- Current approach: Repeated disaster recovery drills are conducted quarterly due to low confidence in backup systems or recovery playbooks.
- Challenge: Today’s systems change too often to justify static recovery playbooks. This indicates an underlying uncertainty leading to repeated drills instead of fixing the root cause.
- Current approach: The alerts and dashboards are usually “configured” from scratch on the monitoring tools.
- Challenge: This creates an overwhelming situation. For example, an absence of alerts doesn’t guarantee that everything is fine; there could be a misconfiguration in alerts or they could be incomplete.
These are all integral aspects of successful software delivery; however, many of these are often afterthoughts and not baked into the design. Wouldn’t it be efficient and proactive if each of these processes followed a well-defined, preemptive path?
The By-Design Approach
The idea of the “think by design” perspective involves established, tested routes for operations to ensure compliance, security and other organizational needs. The approach endorses the use of “golden paths” or well-tested paths for operations. We should adopt all the established and standardized approaches before we delve into implementation.
Let’s say we have to standardize credential creation. We need to ensure that there is a formal way to request a credential for each of the services beforehand, separating intent from the implementations. This provides the ability to separately and programmatically verify both credential intents and implementations, eliminating the need for repeated audits.
There should be a clear and separate policy on credential creation, isolation, storage and rotation, which should be set as a standard for requesting credentials. These policies can be applied programmatically, which will reduce the need for verification tools. This can eventually be moved earlier in the SDLC pipeline, reducing the burden of regular reviewing.
Organizations that adopt the “by design” approach will find themselves naturally compliant, secure and efficient without requiring intensive audits. This approach extends beyond just aiding developers; it brings long-term organizational benefits, ensuring that aspects like cost, security and compliance are inherently optimized.
Let us try to illustrate the principle further with the benefits to the cases and challenges discussed for the by-audit approach:
|Practice||Traditional Approach||“By Design” Approach||Benefit of “By Design” Approach|
|Compliance||Quarterly audits to identify noncompliance.||Policies and procedures designed to ensure compliance from the outset.||Minimized chances of noncompliance issues, reducing the need for repeated audits.|
|Security||Regular reports to check for genuine teams vs. misconfigurations.||Security measures are built into every process by default.||Greater confidence in the system’s security, reducing need for frequent checks.|
|Disaster Recovery||Frequent disaster recovery drills due to low confidence in backups.||Ensures any environment is replicable and DR is baked into the plan of SDLC||Enhanced confidence in recovery systems, eliminating repetitive drills.|
|Observability||Check all dashboards and alerts when an issue arises due to lack of clarity.||Observability is treated as artifacts, which are “shipped” and not “configured,” giving the ability to statically verify completeness.||Streamlined monitoring, reduced false alarms, faster issue identification and resolution.|
|Cloud Cost||Repeated analysis of cost reports and designing action items against each.||Standardizations to guide cost optimization by design.||Enhanced cost base lines, zero leakage and reduced burnout around cost optimization cycles,|
The Silver Bullet?
Like all nuanced answers in the world, the answer here as well is both.
The “by audit” approach is like a quick fix. Think of it as a Band-Aid. If you spot a sudden problem, like spending too much money, the “by audit” method quickly finds out why and tries to stop it. But it might not fix the main reason the problem happened in the first place.
The “by design” approach goes deeper. It’s like a long-term plan. After using “by audit” approach to spot a problem, the “by design” approach looks at why the problem happened and tries to fix it from the ground up. This means that, in the future, the same problem is less likely to recur. Plus, this method lets teams take charge of their own tasks. This means less stress for the main office, and it helps teams feel more involved and responsible.
In short, the “by audit” approach is a quick solution for right now, and the “by design” approach is a plan to avoid future problems. Using both methods together can help solve problems fast and make sure they don’t come back.
To Sum It up
As we look to the next 10 years of building on the cloud, the concept of “by design” becomes even more critical. By integrating best practices and strong standards into the very fabric of our cloud architecture, we can create systems that are inherently more efficient, secure and resilient. The “by design” approach ensures that optimization and compliance are not afterthoughts but integral parts of the development process.
While our old ways with the “by audit” method helped eliminate many problems, in a world where cloud footprint is on the rise and the need for audit tools is growing, a proactive, “by design” methodology offers a sustainable path forward. It’s not just about treating symptoms, but about building a healthier, more robust cloud ecosystem from the ground up. The future of cloud computing demands nothing less.