🔥Let’s Do DevOps: Make Tofu/Terraform More Failure Tolerant with AzApi Provider!🚀
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
A note as we start — I’ve always been an open source kid, and I’ll continue to be so. To reflect that, I’ll be using OpenTofu/Tofu primarily, rather than Terraform, which is now not an open source tool due to a relicensing by Hashicorp. That said, at this point all the code I’ll share will work on both platform exactly the same. Let me know if you want to hear me expand on why I’ve made this decision or other topics here.
Hey all!
This article follows one where I go over what the AzAPI Tofu (also Terraform!) Provider is, and how you can use it to find all sorts of info about Azure, including all the subnets across an entire subscription. That’s pretty awesome, go read it if you haven’t.
This article builds on that topic, and shares a technique I created for a work project — to find the primary private IP of a bunch of hosts that might change rapidly over time — for instance, a pool of application hosts that might grow over time. That list of private IPs can be fed to an Application Gateway or a FrontDoor resource, and traffic can be routed to them — but not if Tofu can’t find them!
This particular project had an additional complication — the Resource Groups that we were looking in for the VMs sometimes don’t exist. Think you’re deploying a new environment, and you deploy the AppGw before you deploy the other Tofu layer that deploys the servers.
Even with the cool AzApi tools we talked about last article, this would error out, with an error that the RG doesn’t exist. That’s annoying — I wish Tofu was set to return an empty set when a data call didn’t find any resources, or at least permitted the response to be configured. However, these are the tools we have to work with, so let’s talk about how we can hack around that limitation.
If you want to skip the explanations and go right to the code, scroll to the bottom for a direct link to a public github repo with all this code.
Find a RG That Isn’t There (Gracefully)
First let’s define a data call to find our current subscription, and then define the name of our target RG, which is a string we know. You can define this statically in all our data calls later, but we use it a lot, so using a locals string will make your life easier.
data "azurerm_subscription" "current" {} | |
locals { | |
# Define RG where VMs live | |
rg_name = "rg-name" | |
} |
What we want to do is this code-block — we want to go find all the servers that live within a particular RG (Resource Group), and if the RG exists, it’ll work 💯. However, it doesn’t always, and Tofu doesn’t yet support a way to do a data call and exit with an empty/null response, rather than exiting the whole terraform workspace with an error like it does now (big, BIG sigh).
So, we have to work around it.
data "azapi_resource_list" "server" { | |
type = "Microsoft.Compute/virtualMachines@2024-03-01" | |
parent_id = "/subscriptions/${data.azurerm_subscription.current.subscription_id}/resourceGroups/rg_name" | |
response_export_values = ["*"] | |
} |
So, since we’re not sure if the RG exists, and we can’t list all the resources in a RG if it doesn’t exist, we have to find out if it exists first! Thankfully, there is a function to list all RGs in a subscription.
One line 4, we define a type of resource, being ResourceGroups, using the 2022–09–01 API, and in line 5, we say we’re looking for all the RGs within the target subscription that we’re currently in (data.azurerm_subscription.current.subscription_id returns the subscription ID of the subscription we’re in, it’s a handy shortcut).
# Find all RGs in the subscription and filter | |
# This is used to prevent a failure when the RG doesn't exist yet | |
data "azapi_resource_list" "subscription_rgs" { | |
type = "Microsoft.Resources/resourceGroups@2022-09-01" | |
parent_id = "/subscriptions/${data.azurerm_subscription.current.subscription_id}" | |
response_export_values = ["*"] | |
} |
Let’s check out the response for all the RGs:
tofu console | |
> data.azapi_resource_list.subscription_rgs | |
{ | |
"id" = "/subscriptions/xxxxxx-xxxx-xxxxx-xxxx-xxxxxxx/resourceGroups" | |
"output" = "{\"value\":[{\"id\":\"/subscriptions/xxxxxx-xxxx-xxxxx-xxxx-xxxxxxx/resourceGroups/Ue2TerraformRG\",\"location\":\"eastus2\",\"name\":\"Ue2TerraformRG\",\"properties\":{\"provisioningState\":\"Succeeded\"},\"tags\":{\"Environment\":\"Dev\",\"Team\":\"Hosting Team\",\"Terraform\":\"true\"},\"type\":\"Microsoft.Resources/resourceGroups\"} |
Oh my. That output is HUGE. It’s an escaped json object stored as a string. That’s okay, we can read it with jsondecode().
Okay, that’s easier for the humans to read! Nice.
> jsondecode(data.azapi_resource_list.subscription_rgs.output) | |
{ | |
"value" = [ | |
{ | |
"id" = "/subscriptions/xxxxxx-xxxx-xxxxx-xxxx-xxxxxxx/resourceGroups/Ue2TerraformRG" | |
"location" = "eastus2" | |
"name" = "Ue2TerraformRG" | |
"properties" = { | |
"provisioningState" = "Succeeded" | |
} | |
"tags" = { | |
"Environment" = "Dev" | |
"Team" = "Hosting Team" | |
"Terraform" = "true" |
I just want their names, though, not all their attributes, so lets iterate over that list and grab their names. We use a for
loop and use the [*]
splat operator to flatten out the list. Woot, that’s (nearly) what we want.
It appears to have returned ALL the RGs, which makes a ton of sense, we haven’t provided a filter at all yet.
> [for rg in jsondecode(data.azapi_resource_list.subscription_rgs.output).value[*] : rg.name] | |
[ | |
"Ue2TerraformRG1", | |
"Ue2TerraformRG2", | |
"OtherRg", | |
"Ue2TerraformRG3", | |
"Ue2TerraformRG4", | |
"StuffWeDontWant", | |
] |
Let’s use regex in that same command to just grab the lists we want. We can use can(regex("string", vm.name))
to properly filter this list iteratively. Boom! 💥 Now it’s just the RGs that we want.
The important part here is that if the RG that sometimes doesn’t exist isn’t here, we’ll return an empty set without exiting terraform. That deterministic, failure-tolerant beahvior is huge here.
That deterministic, failure-tolerant behavior is huge here.
> [for rg in jsondecode(data.azapi_resource_list.subscription_rgs.output).value[*] : rg.name if can(regex("Terraform", rg.name))] | |
[ | |
"Ue2TerraformRG1", | |
"Ue2TerraformRG2", | |
"Ue2TerraformRG3", | |
"Ue2TerraformRG4", | |
] |
Back to Our Tofu Config
Okay, now that we’ve tested out that command in the console, let’s put it into our tofu config in a locals() block to store the response, line 3.
Then on line 7, let’s go find all the servers. Notably, we’re using line 8’s count
to check if the RG we want to query exists. If not, we don’t attempt to read it, and we move on without exiting this terraform layer. If it does exist, we do read it. This helps avoid failure scenarios and make our tofu more failure tolerant.
# Filter list for RasGw RG name | |
locals { | |
rg = [for rg in jsondecode(data.azapi_resource_list.subscription_rgs.output).value[*] : rg.name if can(regex("Terraform", rg.name))] | |
} | |
# Find all VMs in GW Subnet using AZAPI | |
data "azapi_resource_list" "server" { | |
count = length(local.rg) == 0 ? 0 : 1 | |
type = "Microsoft.Compute/virtualMachines@2024-03-01" | |
parent_id = "/subscriptions/${data.azurerm_subscription.current.subscription_id}/resourceGroups/${local.rg}" | |
response_export_values = ["*"] | |
} |
Let’s filter just for our gw
hosts. First we check if the local.rg
object is empty — if we didn’t find the RG to query. If yes, there are no servers to identify, so we just return an empty list. If it’s not empty (length ≥1), we do an iterative.
Notably, we’re using jsondecode() in the for
loop to read the list and find the names, and then filtering based on that attribute. You can use the tofu console to help validate the path to this attribute.
After reading through the json and filtering, we store only the name of the server(s) in a flat list. That’s pretty amazing for how complicated this list of maps of escaped json strings started as.
# Filter for the GW hosts | |
locals { | |
server_names = length(local.rg) == 0 ? [] : [for vm in jsondecode(data.azapi_resource_list.server[0].output).value[*].name : vm if can(regex("gw", vm))] | |
} |
Over to AzureRM, Read Server Data
Now we have a list of server names, if they exist, to iterate over. I think Tofu would gracefuly handle an empty set in the for_each
and not exit, but my trust in the fault-tolerance of Tofu/Terraform is pretty low — it likes to exit when it sees a failure condition, so lets do an explicit check of the length of the server_names list, and set the for_each of this data call to []
(empty set, don’t run) if there are no servers.
If there are, we iterate over the list of servers (after converting to a list, as for_each wants). And all that logic is just on line 4.
On line 5, we know the name of the servers, it’s our iterative. And on line 6, we know the name of the RG, and we’re sure it exists, on line 6.
# Look up the GW host info. If there are none, no hosts will be added to backend pool | |
data "azurerm_virtual_machine" "gw_host" { | |
# Iterate over gateway server names. If none, this resource isn't built | |
for_each = length(local.server_names) == 0 ? toset([]) : toset(local.server_names) | |
name = each.value | |
resource_group_name = local.rg | |
} |
The output is a huge data call, that is well structured (thanks AzureRM provider!), and we can just print the private_ip_address of the host into our list.
locals { | |
gw_private_ips = length(local.server_names) == 0 ? [] : [ for vm in data.azurerm_virtual_machine.gw_host : vm.private_ip_address ] | |
} |
Let’s validate — we can print out the gw_private_ips using local.gw_private_ips
, and we see that we have a list of IPs, with one IP representing one server.
We can check the length of these IPs, and learn there are 3, which makes iterating over it (to add to the back-end of an AppGw, or a FrontDoor resource).
> local.gw_private_ips | |
tolist([ | |
"172.16.0.5", | |
"172.16.0.6", | |
"172.16.0.7", | |
]) | |
> length(local.gw_private_ips) | |
3 |
Summary
And there you have it — a fault-tolerant method of finding the RGs that exist in an account, and if the properly named ones exist, to find the servers in them, filter for name, and then find those VMs’ info, including their primary IPs.
Tofu is an incredibly flexible tool, even with Provider Authors that don’t make their code terribly fault-tolerant.
What other topics do you want me to cover? Let me know in the comments!
Good luck out there folks.
kyler