Cloud Services / DevOps / Security / Sponsored / Contributed

Terraform on AWS: Multi-Account Setup and Other Advanced Tips

10 Nov 2020 1:10pm, by

Thundra sponsored this post.

This article will explore certain advanced areas of HashiCorp’s Terraform usage, focusing especially on how to use Terraform when managing multiple Amazon Web Services’ accounts — which is increasingly popular, either due to the sheer size of an organization or a deliberate choice by its DevOps teams. In actuality, AWS itself is gently pushing its customers to at least consider using multiple accounts, and it recently released new services to help you do so.

The Need for Multiple AWS Accounts

Serkan Özal
Serkan is co-founder and CTO of Thundra. He has 10+ years of expertise in software development, is an AWS Certified PRO and has a patent on distributed environments. He mainly works on serverless architectures, distributed systems and monitoring tools.

There are many reasons why you may want or need multiple AWS accounts for your organization — for example, to easily increase security. This strategy can help you segregate resources per line of interest, allowing you to, say, grant one AWS account per developer, environment, or organizational department.

Managing security, in this case, is certainly easier than having one giant AWS account for everything, where managing IAM permissions becomes very difficult. By default, separate AWS accounts have absolutely no access to each other, making it impossible for one account to access the resources of another account.

In practice, large organizations were already using multiple AWS accounts due to their sheer size, but the accounts were not linked. This made the job of the purchase departments much harder, because each AWS account would be billed separately and it would be up to the organization to consolidate billings according to their internal rules or governmental regulations. AWS created AWS Organizations to address this problem; it allows you to manage a hierarchy of accounts and consolidate billings higher up the hierarchy, simplifying the job of your purchase departments. More recently, AWS released AWS Control Tower, which can help you with provisioning new accounts with a preset set of resources and managing all those accounts under one roof.

The New Trend

Overall, there is a push from AWS for organizations to use multiple accounts — probably because people tend to give all users in their account administrator privileges, and thus all users can see and do anything in that account.

Crafting IAM permissions instead of using the default administrator access for those users is time-consuming; plus, there is no guarantee that the principle of least privilege would be respected. On the other hand, by default, an administrator user on one AWS account can’t see or do anything on another AWS account, even in the same organization. So, this is secure by default and requires additional work to craft IAM permissions to allow a user in one account to access the resources of another account, if required. DevOps engineers will also be more likely to apply the principle of least privilege in such a case.

This trend has implications for Infrastructure-as-Code (IaC) — for example, when using Terraform. IaC tools are usually wired to work on a single AWS account by default. Very early on, Terraform featured the ability to have multiple so-called “providers” in a single script, which would allow you to access multiple AWS accounts, or even accounts from different cloud vendors.

Infrastructure-as-Code with Multiple Accounts

Managing Resources in Different Accounts

The first use case is the following: A single Terraform state could manage resources in different accounts. This usually requires the declarations of multiple “provider” blocks, typically one per AWS account.

By default, an AWS provider block will work in the account referred to by the credentials used to connect to the AWS API. For example, if using an access key ID/secret access key-pair of a given AWS user, Terraform will apply to the account this user belongs to; the Terraform code would like this:

In order to create resources in a different account, you will need to use the “assume role” option of the “provider” block, which allows you to assume a role in another account and get access to that account:

Obviously, this would require that you set up the IAM permissions of that role and the user executing the Terraform commands, so that the user has the permission to assume the role. AWS documents how to achieve this here.

Using an Intermediate Role to Access a Third Account

The second use case is an extension of the first one: The assumed role may have permissions to create, update and delete resources in a third AWS account. This would require crafting IAM permissions in the final AWS account to allow an intermediate AWS account to create, update and delete resources in the final AWS account. The flow of control would look like this:

Figure 1: Flow control for an intermediate role to access a final AWS account

This use case becomes quite complicated and the management of IAM permissions can be quite cumbersome and difficult to debug. Such a setup might be worthwhile in a specific situation, but the security benefits are limited. Indeed, instead of targeting account 333333333333 in the above diagram, hackers would try to gain access to account 222222222222 to gain control or resources located in account 333333333333. If account 222222222222 is used to control resources in other accounts beyond 333333333333, you could argue that the security is weaker because gaining access to account 222222222222 would open up an even wider access.

Other Advanced Terraform Strategies

Multistates

If the number of resources becomes even moderately large, it would probably be a good idea to split the Terraform scripts into multiple states — especially when using continuous deployment (CD). Managing all of your resources in a single state has some drawbacks:

  • Every time you apply even tiny changes, you will fear that Terraform will touch some foundational resources you don’t want touched.
  • Erroneous changes in the foundational resources applied blindly through continuous deployment could be devastating.
  • The IAM permissions required to apply the Terraform script would be wide-ranging and certainly more than necessary for a CD setup.
  • Quite a lot of time is required to apply the changes because Terraform will need to fetch the state of all the resources managed by the state, even if the vast majority won’t change.
  • The impact of a failed deployment could be wide-ranging.

There are usually a number of foundational resources that change very little — such as VPCs, subnets, Transit Gateways, VPNs, RDS databases, and load balancers. Those resources would belong to a “foundational” state that would seldom change over time. Updates to such a state would typically be run by a human.

And then there are resources that usually change a lot — such as EC2 instances, autoscaling groups, ECS task definitions and services, and EKS deployments. Such resources would typically be the ones deployed (or updated) by CD pipelines. They could be placed in a different state from the foundational resources. This distinct set of Terraform scripts would manage just the small, fast-changing subset of your workload’s resources, and you would require far fewer permissions. Segregating Terraform states in such a way would be ideal for automated deployments.

An interesting example of this type of setup is a Kubernetes cluster managed by the foundational stack, where the Kubernetes deployments are managed by the CD stack.

Modules

Terraform makes it easy to modularize your IaC code. You just need to put your code in a different directory and use the “module” directive like so:

Terraform hosts a “registry” that contains a lot of modules written by the Terraform community for public use. Before using this registry, make sure the modules you intend to use are compliant with your organization’s security policies.

Terraform modules help you to keep your code DRY (Don’t Repeat Yourself). However, there is still some boilerplate code that you can’t modularize — such as backend definitions and calls to modules themselves.

Environments

It is a very common scenario to want multiple environments for your workload — staging, production, testing, etc. Typically, it’s a good idea to keep those environments as similar as possible to maximize the chance that a deployment working on, say, the “staging” environment will also work on the “production” environment — thus avoiding the “but it works for me” excuse.

Terraform offers “workspaces,” but it requires you to switch the workspace (e.g., from “production” to “staging”) before applying your changes. When this is done manually, there is a very high risk that one day someone will forget to switch the workspace and deploy to an unintended environment, with obviously potentially catastrophic consequences.

The other solution is to have different sets of scripts for different environments; but this is cumbersome and a lot of code would be duplicated between those environments, even when using modules. This is actually one of the reasons why Terragrunt was developed, further explained here by one of the founders.

Conclusion

In conclusion, we can summarize that in large (and not so large) organizations, a multi-account AWS setup is very usual; and that with a bit of crafting, it is entirely possible to run Terraform in such a multi-account setup. In fact, Terraform has been developed from the ground up to be multicloud and supports a vast array of “providers” that can be used within a given set of scripts.

Some tips covered in the advanced strategies of this article can help make your Terraform code DRY and tidy.

Amazon Web Services and HashiCorp are sponsors of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.