TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Compliance

3 Steps Cloud Governance Steps to Avoid the Next Hack

Even some of the most obvious preventative measures aren’t given sufficient attention and priority when it comes to already bogged-down engineering organizations.
Dec 5th, 2023 10:30am by
Featued image for: 3 Steps Cloud Governance Steps to Avoid the Next Hack
Image via Pixabay.

Cybercrime and hacks are always painful, but they’re even more sobering when there is nothing novel about them, and it’s just the result of IT governance and security hygiene constantly being pushed into the backlog.

While writing the book 10 Steps Every CISO Should Take to Secure Next-Gen Software, I learned that even some of the most obvious preventative measures aren’t given sufficient attention and priority when it comes to already bogged-down engineering organizations.

Prioritization is often hindered due to a lack of understanding across functions and roles: security not understanding the impact of changes to the way cloud applications are developed, deployed and maintained; and, DevOps not understanding how their actions inject or create added risk. While this book was published several years ago, these dysfunctions still hold true.

The recent MGM hack taught us all that well-known tactics (like social engineering as a means to securing the crown jewels of privilege escalation) are still effective methods. When successful, they tragically continue to fuel malicious entities on the road to major payoffs.

While hindsight is 20:20 and it’s easy to say “you should have…”, it’s always good to reiterate and emphasize some of the best practices when it comes to IT hygiene and governance. Refreshing the basics and reminding yourself about some of the more obscure to-dos will hopefully help prevent the next hack.

A Cloud Governance Checklist to Keep Handy

We’ve come a very long way with regard to cloud governance, GitOps and cloud security. Today, through a combination of automation, policy as code, and improved visibility, it’s possible to have accurate real-time information that can help you detect and remediate potential risk. With the attack surface constantly growing and evolving, minimizing risk through simple guardrails can be the difference between rapid recovery and costly and extensive downtime.

Below we’ll share some of the best practices that every IT, DevOps, SRE and Security engineer should be adopting immediately to help bring cloud “up to code” with better security and reliability.

1. Immutability and Policy Management

Let’s start with immutability — a concept that is not new and has become the standard best practice thanks to tools like Terraform and infrastructure-as-code, which have “codified” immutability into our systems.

The safety blanket immutability provides the assurance that configurations can’t be changed without intervention or by a single entity — whether external and malicious or internal and ill-informed.

This was baked into DevOps as a standard, mostly to prevent production incidents and downtime, but as a byproduct it provides the added benefit of security, ensuring no one can hack into your cloud-based systems and make changes that aren’t detected, and that no junior can accidentally delete production without a recovery path.

What powers this further is codifying policies that provide guardrails for cost, reliability and security, then automating this governance.

In another hard lesson learned, we all know that if it isn’t automated, it won’t happen. Patching is a prime example.

For years, the annual threat report from Fortify, (my previous employer) cited failure to patch as the single greatest threat. Now in 2023, in its annual report, M-Trends 2023 Report, Mandiant explains why patching and vulnerabilities continue to result in global events that exploit common vulnerabilities: “Where Systems Administrators need time to test and validate patches, threat actors need only the barest coverage in proof-of-concept (PoC) code to start targeting those organizations.”

At a security conference in 2018, I declared that with cloud adoption and DevOps tools, I expected misconfigurations would be on par with threats from failure to patch at some point.

I think we’re there! In their M-Trends 2023 Report, Mandiant notes, “Multiple layers of identity management and application deployment create a new verticality to client environments that must be secured.”

“It is not uncommon for misconfigurations to arise as the implementation and design phases of cloud service migrations meet the hard reality of business operations. Organizations should consider testing their cloud architecture deployments to promote resilience against motivated, agile adversaries.”

Policy as code enables you to automate the canonical and repetitive good practices and controls that apply to your cloud systems, to ensure that even if something does change you can have continuous and real-time detection of cloud drift and policy violations that can be handled immediately and decisively.

2. Visibility and Auditability

While the SBOMs (software bill of materials) and securing your software supply chain have been the latest buzzword and hype, few are actually paying attention to the equally important IBOM (infrastructure bill of materials).

Many are aware that application software includes dependencies — modules of specific subfunctions, often written by third party and/or open sourced.

Similarly, the cloud infrastructure upon which cloud native applications depend, is also made up of dependencies — sub-functions that define and configure specific resources used by a given environment.

As an example, consider an EC2 instance whose dependencies may include a Network Interface and an EBS Volume. Dependencies can go several layers deep.

Now consider that I may be managing this with a Terraform module. This image depicts an actual relationship between cloud resources.

If a developer changes the HashiCorp’s Terraform state or a cloud engineer changes one element within the cloud resource structure, we now have a disconnect between what we think is configured (Terraform) and what is actually configured (the cloud resources).

This is known as configuration drift and is super important to manage. In the 2023 State of IaC Report, we found that most identify this drift manually and it can take weeks to resolve it.

Going back to misconfigurations being on par with failure to patch, this is akin to leaving a system unpatched and vulnerable for weeks

A comprehensive inventory and health status of your cloud infrastructure stack are the backbones to good auditability.

Newer organizations that have been “born in the cloud” could learn from practices that organizations with a legacy of on-premise IT have known: You need to start by understanding what assets you have, to then be able to have the visibility into change history, version control and management.

For on-premise IT, this practice uses CMDB (Configuration Management Database) tools like ServiceNow to catalog your assets and IT Asset Management to manage changes to them.

These tools, developed when you had to badge into a data center to change a hardware configuration, often fall short when it comes to keeping an accurate accounting of ever-changing cloud assets.

When you codify all your cloud resources and automate drift detection, you can then apply the same version control and history management to your infrastructure as you apply to your code. You gain the ability to monitor when assets changed, where they were changed and by whom, then roll them back to prior versions when necessary.

3. Disaster Recovery Enabled by Code & IT Governance

In addition to being able to view which cloud assets have changed when they’re managed as code, and roll them back as you would a bad commit, this also provides the added and likely more important benefit of disaster recovery.

One of the hardest decisions that had to be made to be able to recover from the MGM attack, where the resort chain was hit with a $100 million ransomware attack, was to delete critical assets that did not have a backup.

One of the applications exploited in the hack was Okta, which enabled them to ultimately gain access to company servers and initiate a very painful denial of service attack to many mission-critical applications.

We have historically spoken about the need to codify not just your managed cloud assets, but all of your SaaS application configurations as well, and this includes managing your Okta as Code as well.

Had Okta configurations been managed as code, and the same version control practices applied to this critical SSO service, the mean time to recover would have been significantly shorter for this breach, with minimal disruption and loss of business.

This is true if an IT administrator accidentally deleted an important system configuration, through data corruption, and even ransomed software (as seen in the Caesars Resort attack a week before the MGM attack).

This is the inherent benefit of managing all of your cloud assets as code (aka everything as code), as well as the added benefits of automation, consistent deployments, and auditability.

Okta is not the only SaaS app you should manage as code, all SaaS applications — from monitoring tools to APMs, IAM tools and CDNs — should be included in a strategy for managing their configurations with code for the disaster recovery benefits.

There are plenty of tools out there (Firefly among them) that can scan your cloud, find these resources and automatically import them into infrastructure as code like Terraform, Pulumi, or CDK, which can serve as a quick back-up service for important applications like CloudFlare, DataDog, and your git repository. If your software repository were corrupted or ransomed, how long would it take you to recover without these backups?

A typical cloud infrastructure includes many other security settings that need to be codified and governed. Consider, for instance, security groups.

A security group acts as a firewall that controls the traffic allowed to and from the resources in your virtual private cloud (VPC). You can choose the ports and protocols to allow for inbound traffic and outbound traffic. Several cloud services rely on Security Groups including:

  • Amazon EC2 instances
  • AWS Lambda
  • AWS Elastic load balancing
  • Container and Kubernetes services (ECS and EKS)

If the security group settings were changed, you can imagine the potential consequences. Capturing this important resource as infrastructure as code and then managing its changes and alignment to policies is an important aspect of securing your organization.

Check Your Cloud

Cloud governance is not uncharted territory. While it does require some domain expertise, many of these practices are well-established and even widely known by now.

By not applying simple and recommended good practices to your IT and cloud environments, you are putting both your sensitive customer information and business-critical systems at risk.

As cloud utilization grows, it is imperative that we do a better job of covering the basics of IT governance and security of our cloud infrastructure. Let’s not be left closing old gaps, it’s time to free up our engineers to focus on the more novel and emerging threats.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.