🔥Let’s Do DevOps: Bring AWS Entire Accounts Under Terraform Management

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…

Sep 13, 2021

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all,

Terraform is a wonderfully flexible, powerful Infrastructure as Code (IaC) tool that can help you manage your AWS accounts at scale. However, it’s not (yet) magic, and still, need to be told about the individual resources and to link each real resource with an individual configuration block.

I work at a shop that had a huge amount of time and effort invested in SparkleFormation, a ruby-based tool that is a constructor for CloudFormation, the native IaC from and for AWS. We have built literally tens of thousands of resources with it and desired to bring all of them in-house under Terraform.

That’s no small feat, and I spent a few years working on a way to make this as easy and quick as possible. There are a few extra tools to help you along, and I wrote some of my own configurations. Let’s walk through the available tools, and then when I chose and how I customized them to make life as easy as possible.

What Makes up Terraform?

There are two major pieces of terraform:

First, core Terraform. This is the terraform binary, it knows how to read HCL, construct change-sets, interact with your console, etc. It’s written in Go, and the source is stored in GitHub
Second, “providers”. These belong to individual clouds, like AWS, Azure, VMWare, etc. They define individual resources in a specific cloud, as well as understand every attribute of the resource, implication to update it (does updating attribute XX of this resource require rebuilding the resource?), as well as implement the API conversion logic. For instance, if attribute XX needs to be updated, issue this API call to XX API gateway and then read the response and act on it

This breakdown is important so you can understand how terraform operates. Let’s cover that now.

How does Terraform Do?

Terraform is a binary that lives on your computer. It reads configuration written in HCL (Hashicorp Configuration Language) to understand the state you’d like your infrastructure to end up like (your desired end state). HCL looks like this. It defines a Resource of type aws_vpc, and locally calls it “main”. Then it defines attributes, including nested attributes like the tags block.

resource "aws_vpc" "main" {
  cidr_block       = "10.0.0.0/16"
  instance_tenancy = "default"

  tags = {
    Name = "main"
  }
}

Terraform then reads the “state file” on your computer. This state file is important and often misunderstood. It does a few things:

Instructs Terraform which blocks of code link to which real resources via GUID
Contains the last known state of those resources. Terraform reads the diff between the desired state and the last known state to construct the changeset, or list of differences that it will attempt to fix

Once terraform understands the difference between the last known state (in the state file) of all resources vs the desired state (the new configuration), it now understands what has changed for each resource. It authenticates to the remote provider (AWS, in this case), then converts each change into the correlated API calls and issues them to the cloud provider.

We’ll have to do all of this to make our bearhugging a success:

Write the configuration to represent a resource
Link the configuration to a real resource via GUID
Check Terraform apply for that resource and true up any potentially catastrophic changes

And we have to do that for every resource! You can imagine that can be time-consuming really, really quickly if you need to bearhug several thousand resources. Let’s talk about some tools that can make your life easier.

Reverse Terraform: Infrastructure to Config

As with almost every problem in any situation, we’re not the first person to imagine the problem, and probably not the first person to attempt to solve it. There are two major tools that exist that have attempted to solve this problem.

Terraformer

The new hotness in this space is Terraformer (GitHub), released May 2, 2019, by an engineer on the GoogleCloudPlatform team. Terraformer calls out Terraforming, the next tool we’ll cover, in its README.MD, and says that it supports many more clouds (like Azure, GCP), and is written in a more extensible and flexible manner, to make it easier to add resources in the future.

Terraformer isn’t able to read exported console variables to authenticate, how Terraform does. Terraformer requires an AWS configuration file populated with the same secret information you’d normally paste into a console. Then the terraformer tool needs to be told which profile to use. This extra step is confusing and poorly documented.

In testing, I wasn’t able to get this tool to completely run against “*” resources. Though many resources are supported, some of them appear to cause a process panic and the thread exits, incomplete.

I have no doubt this tool will be amazing in the future and already supports many resources more than Terraforming does, but it isn’t yet stable enough to be a part of your business processes in my opinion.

Terraforming

Terraforming (github) is an older tool from Daisuke Fujita (dtan4), and covers many resources in AWS, particularly those that existed a few years ago. It is minimally updated these days, so new resources (ECS, ECR, EKS, Workspaces, likely others too) aren’t well supported. But still, if we can import 80% of our resources, that’s way better than the 0% we’re at now.

Terraforming is able to read exported console bash variables to authenticate, in the same way that AWS is able to, which makes it easy to authenticate.

By default, Terraforming must be pointed at only a single account, in a single region, at a single resource type that it supports. That’s a big difference from Terraformer, which supports being pointed at resource type “*”, which means all that it can. However, we’re able to wrap Terraforming into a bash loop and export all resource types that are supported.

Here’s a link to an existing write-up of my bash script:

Sync Terraform Config and .tfstate for Existing AWS Resources

Kyler Middleton

September 18, 2019

Hey all! Terraform is a great (and dominant) infrastructure automation tool. It is multi-cloud, can build all sorts of resources, and in some cases supports API calls to build resources before the native tooling from cloud providers does. However, it’s dependent on a state file that is local, and only reflects resources created by terraform, and a local c…

Read full story

Because I use a bash loop, it’s error tolerant, and will still sync out all types of resources it can, even if one of the types has some type of error.

Terraforming Initial Snapshot

Let’s take our initial snapshot of all resources terraforming can get to.

Let’s install our dependencies and run this script.

# For *nix:
apt-get install terraforming grep awk sed find terraform echo 
# For mac: 
brew install terraform terraforming

Clone down the script (or copy paste, it’s a single bash file), and set it as executable:

chmod +x terraforming_import_aws_vpc.sh

Next we need to auth to AWS. Log into our SSO and find the account we want to point at. For instance, Iron. Click “Command line or programmatic access” for the “admin” role and copy out the auth strings. Paste those into your browser session. Only this 1 terminal session is authenticated as admin.

NOTE: Be very careful with this shell. It is an administrator and can cause damage. CAUTION

Next, let’s run this script. Fill out the single region you want to run against, e.g. us-east-1. Multiple regions aren’t supported by this script or tool.

$ ./terraforming_import_aws_vpc.sh
Creating temp directory to stash files
Which region do you want to run against, e.g. us-east-2, us-east-1 
Utilizing terraforming to sync config files for all resources into temp folder

The script will start running. It’ll take 5–10 minutes to run. It’ll build a few files, which you should retain and use. The primary useful one is the terraform configuration file, terraform_config.tf. It doesn't contain every single resource in the account, but it's close. Notable resources for which config isn't built and you'll need to investigate manually:

ECS/EKS/ECR
Workspaces
Lambdas

It’s likely there are some issues with the created config. Just read the outputs, make changes to the terraform_config.tf, save it, and then run terraform init again. Repeat until the code shows as valid.

Options for Importing

You now have a decision point in front of you. You can either accept this terraform config as-is (it can be a VERY LARGE file with all resources in it), or you could use this information and config as a copy-and-pasteable source file, where you can structure your code how you like.

At worst, you should put all ec2 instances in a single file, and all load balancers in their own file. At best, you could create sub-folders to store resources, and call your own modules, and import these resources into that syntax. That’s a lot more work, but the finished product will be quite a bit nicer.

I’ll proceed assuming you want to rewrite the config and start importing resources in your own path.

Create the New Account Provider, Terraform Stuff

We now need to create the new (empty) terraform workspace. This is where you’ll copy your config over to.

First, create the terraform block. The backend will be commented out for now — that’s not very CI/CD friendly, but we’ll be doing all this bearhugging locally first, then we’ll do that part.

terraform {
  required_version = "~> 1.0.6"
  required_providers {
    aws = {
      version = "~> 3.58.0"
      source  = "hashicorp/aws"
    }
  }
  #backend "s3" {}
}

Then we’ll create the provider block to set the region we’ll operate in.

provider "aws" {
  region = "us-east-1"
}

In the main.tf, there will likely be a bunch more subsystems and module calls. Comment all of them out, top to bottom. We need to make sure this is valid without any resources first. In our authenticated session (where you pasted your auth creds), run terraform init. If you see the following, everything is great.

terraform init
Initializing modules...
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 3.52.0"...
- Installing hashicorp/aws v3.52.0...
- Installed hashicorp/aws v3.52.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!

Auth to AWS

Next we’ll create a resource configuration and need to talk to AWS to create the import. To do that, Terraform will require authentication to your AWS account. You can follow this link to help you authenticate your terminal to AWS.

Create a Resource Configuration

Now we have a viable and working terraform workspace, we need to create a resource configuration. If you want to have any sub-folders or modules, now’s the time to do that. Let’s assume simplicity here, and just create a new file in our root directory called main.tf and create a resource block there.

The best way to do this is to copy the same resource from your large snapshot file from Terraforming — paste it into this workspace.

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

You can’t easily change the local name of this resource later, so make sure to name it what you’d like now.

Let’s run terraform init to make sure any dependencies or module calls are included. If you see lines like this, you’re likely happy and good! Let it run the whole way through and note if there are any changes already from state. Hopefully not, and instead it’s only creating a LOT of things. That’s okay — we need to link these new resources with the existing ones in the account we’re going to bearhug!

$ terraform plan -input=false
(removed)
Plan: 29 to add, 0 to change, 0 to destroy.

You’ll hopefully see your resource change plan:

# aws_vpc.main will be created

This string, (aws_vpc.main) is the path to the resource, you’ll need that for the import, so note it down.

Link Our Config to a Real Resource

You’ll spend most of your time importing resources. Now that our workspace is ready and valid, we want to start linking real resources with our existing plan resources.

$ terraform plan -input=false
(removed)
 # aws_vpc.main will be created
  + resource "aws_vpc" "main" {
      + cidr_block = "10.0.0.0/16"

Terraform thinks it’s creating this resource fresh, since it’s not linked to a real resource yet. Let’s find out how to link it. First, start with the type of resource, which is here, aws_vpc. Let’s go find the Terraform Resource page for this resource type (link), and scroll all the way to the bottom to find the Import section.

It looks like we can import aws_vpc by using their “VPC ID”.

$ terraform import aws_vpc.test_vpc vpc-xxxxxxx

Go into your AWS console and find the resource’s real ID. Add that to your import command, and run the import:

$ terraform import aws_vpc.main vpc-12345667778
aws_vpc.main: Importing from ID "vpc-12345667778"...
aws_vpc.main: Import prepared!
  Prepared aws_vpc for import
aws_vpc.main: Refreshing state... [id=vpc-12345667778]

Import successful!

Yay, Import successful! indicates we are good to go on this resource. Let’s run a targeted plan to check if Terraform plans to update anything for this resource. We use that same path to the resource as a target for plan:

$ terraform plan -target aws_vpc.main
aws_vpc.main: Refreshing state... [id=vpc-1234567890]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

# aws_vpc.main will be updated in-place
  ~ resource "aws_customer_gateway" "main" {
        id         = "vpc-1234567890"
      ~ tags       = {
          ~ "Name"      = "Hello World!"
        }
    }

It is planning to change some stuff, but the resource isn’t being recreated, and the only thing changing is tags. I added a tag in the background so we could see what a change looks like! This change plan looks perfect.

NOTE: Be cautious here. You need to read the change plan for each resource, and validate nothing scary is going to happen, like destroying or rebuilding a server, RDS, etc that stores data that can’t be easily recovered. Some metadata resources (that store only configuration) can be rebuilt without much impact, but think hard about each resource that’ll be destroyed or recreated. This is very important for a successful bearhug.

And that’s it! Well, that’s it for one resource. We now need to loop through ALL the other resources, and there are a lot. I have some helper information for you though!

I’ve kept all the import commands I’ve used on all our VPCs, and you can use them as an excellent template for all the resources you need to import. In fact, I’d copy this out into a note and start updating it for the values you need.

Run each command, but save all the previous commands. That’ll help you audit later and if you have ANOTHER server to do, you know just what you did for the last one.

Good luck!

Special Resource Notes

Some resources act a little funky. This is an incomplete list of the resource types and what you need to do for each.

Security Groups

Security groups, when imported with our script, are combined and all descriptions are stripped off. That isn’t ideal, so you have two tasks when importing a security group — decombine them when necessary, and add descriptions on each rule to retain all our historical information.

VPNs

VPNs are rebuilt during our first TF Apply for that resource. We’ve never been able to get them to match exactly what AWS is looking for. Due to this, you’ll need to work with the NetOps team to have them ready to rebuild the resources with the new PSK and probably internal IPs (and maybe public IPs) when that rebuild happens. You could probably do this ahead of the big-bang style cutover window.

EC2 Instances

EC2 instances use our ec2 module, which does a lot — it creates IAM roles, profiles, and manages the disks. Make sure to copy the import commands for the Ue1Ti import examples above to see what disk imports look like, they can be tricky.
Make sure to run example tf plan -target path.to.instance[0] especially on instances, they often contain data and processes that are difficult to recover and would cause a business impact if they go down.

Summary

Great job! This isn’t a simple or easy process, but you’ll get a lot of practice importing resources in your own env. Take it slow, do a great job, and your finished product will be great.

Good luck out there!
kyler

Let's Do DevOps

Sync Terraform Config and .tfstate for Existing AWS Resources

Discussion about this post