đ„Azure DevOps & Terraform: Breaking Up The Monolith â Strategy
Hey all!
Azure DevOps is a CI/CD automation platform from Microsoft ($MSFT). It supports repositories and running all sorts of code and automated code against the code (among many, many other functions). This includes Terraform, a tool that converts scripted, declarative configurations to real resources in cloud (and other) providers via API calls.
Terraform has been an excellent tool for us so far, and is starting to be adopted by other teams, for other purposes, to manage more accounts and resources. Which means the model we selectedâââto have a single terraform file (with a single .tfstate file) that calls all resources and configurations for all resources in an environment, is quickly getting strained.
Hereâs an exampleâââsay you have this above environment, with a single file. You have a dozen developers working in parallel building projects and adding them to the single monolithic file. Changes might get through the PR process without being properly vetted. Devs might push changes to the terraform repo and not deploy changes yetâââmaybe the changes arenât ready yet, maybe they shouldnât be deployed yet for some dependency reason. And now itâs time that you want to push a tiny little changeâââmaybe to change the size of an instance. You push your PR, run a terraform plan, and it wants to change 22 resources in 3 different time-zones. Would you push the approval through? If youâre an experienced engineer, heck no you wouldnât. You could break any number of things.
So thatâs a scary situation, and probably an eventuality for most companies that start using terraform and donât plan an extensible way to manage these files But thatâs okayâââfor better or worse, the best driver of innovation is impending failure.
What Options Do We Have?
So how can we fix this problem? I have a few different strategies I want to discuss.
Option A: A few more TFÂ files
We could of course break the single monolithic TF and .tfstate file into a few TF files. For instance, put all servers into a single file, and all databases into another. This has the benefit of minimizing changes to process, and putting off the eventual time where many changes are queued up for TF apply-ing.
This has the benefit also of being easily supported by Azure DevOpsâââyou can point the native Terraform plan/apply steps at the several different files, even have them in different concurrent stages of the Terraform release. They can all run automatically, and boom, youâre in business.
The big con here is that the problem is only delayed. You have expanded the ability for your processes to scale, but youâre still queueing up changes within a single file. And youâre going to need to do this again and again in the future.
What would be more ideal is a solution to the problem, rather than a bandaid. So what else can we do?
Option B: Many project TF files, Terragrunt recursion
A problem with Azure DevOps and Terraform in general is that each Terraform step must be pointed at a single directory, and Terraform doesnât support recursion. Which means if you have half a dozen TF files that need to be run, your TF release pipeline is going to be relatively complex. But if you have hundreds? Itâd be untenable. Not to mention that ever time a project is added your release project would need to be updated.
Which is exactly the gap that Terragrunt looks to fill. It natively supports recursion, complex deployments, and lots of tools to keep your configuration DRY (humorously, Donât Repeat Yourself).
A pro here is that now you can expand ad infinitum with Terraform stacks. Your can tell your devs that if they drop their terraform code into a folder tree you specify, their code will be executed on the next run.
Thereâs still some downsides. Terragrunt, because of its additional deployment logic, requires new files to be added, and some changes to your TF stack config. If you already have lots of files, not great. And learning a new tool just for this problem isnât ideal either. One complication that seems trivial (but probably isnât) is the Azure DevOps tasks that consume a Service Principal are for Terraform in particular, not any other command, even if itâs very similar (Terragrunt). Which means youâre looking for a Terragrunt deployment module, which⊠doesnât exist (yet). So youâre deploying code with straight-up terminal commands, and handling the service principal authentication yourself, which isnât a security best practice.
And one of our big initial drawbacks remainsâââwhen an âapplyâ is run against the top-level of the folder structure, all changes that have been queued up by PR approvals in the terraform repo will be executed. Again, we might end up pushing out dozens of changes if devs havenât been applying their changes right after getting PRs approved. Still not ideal.
Ideally, weâd be able to get all the benefits from Option B (Recursive Terragrunt) without learning and implementing a new tool and applying changes en masse during a single run. And what a monster Iâd be if I didnât present something that satisfied that criteriaâââcustomized
Option C: Targeted, custom Azure DevOps release pipelines
What many companies do is implement Jenkins, an extensible CI/CD that permits more customization of releases, including setting variables that can target particular files for jobs. This is used to help target and run specific Terraform file updates.
Thankfully, Azure DevOps supports similar functionality. The functionality is relatively recent and still in development, so documentation isnât great. However, we can piece together enough disparate features to make this work well.
When initiating a TF release pipeline, we can surface a variable that can be consumed by our TF steps within the pipeline to target specific files for execution. Combine building individual TF files with individual state files with a release pipeline that permits executing single TF files one at a time, and we can scale out indefinitely (thousands of TF files) and programmatically define where the TF state file is stored for each TF file.
Conclusions
The output of all this:
We can scale out TF files indefinitelyâââTF files now stand alone, and arenât all tied back to a single file that can become cluttered and queue up many changes
Changes can be applied carefully and methodicallyâââTF updates arenât applied all at once for an entire folder structureâââthey are targeted and only a single stack is updated at a time
No new tooling has to be implementedâââWe can rely on native Azure DevOps and Terraform functionality. Thereâs no need to teach your team an entirely new tool and methodology
In future blog posts Iâll be looking at Terragrunt to implement TF recursively in a folder structure, and separately at customizing Azure DevOps release pipelines with custom variables to permit releases targeting only a single arbitrary TF file.
Good luck out there!
kyler


