🔥Let’s Do DevOps: We provisioned the hosts, now what? Let’s talk Ansible AWX vs Tower
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all,
Coming off my long focus on Terraform and CI/CDs (see my profile page for my many articles) I took some time to relax. I had architected a broad solution, and now it was time to teach all the development and operations teams I work with about it.
I love that part of the job — architecting a great solution is only half the battle — you have to sell it to the groups as better than what they do now. So much better, in fact, that they’ll learn a whole new tool and process, a non-trivial ask of operations and development teams that are busier than ever.
And now that process is coming to a close — the teams are sold on the investment, they’re using the tool, I can pat myself on the back and… well, get started on a new project, of course.
And it’s really an intuitive step forward — I worked for a long time on a provisioning pipeline to build resources. And the next step a dev takes is to… configure those resources.
Post-Provisioning Toolkit
The teams I’ve worked with, like most other teams working to upgrade their DevOps arsenals use a mix of bash, Powershell, brew, pip, and other third-party tools to configure servers. These tools work well but have weaknesses — they’re not standard, they’re often entirely home-baked, which makes them complex and have to adapt between jobs, and they’re built from scratch, meaning dev time is high.
Open-source tooling has been doing its best to keep up, and there are many many entries in this field. The big players are:
Chef: Chef requires hosts to run a chef installable client. Chef uses a Ruby DSL to define configurations, which makes picking this up difficult for operations teams not as well versed with programming.
Puppet: Puppet-master (manager server) must be on Linux, but can manage windows as well. Puppet requires each host to have the puppet client installed on them. Puppet uses its own configuration language with its own DSL, again not very intuitive for non-programmers to pick up.
Ansible: Ansible is a client-less (no software required on endpoints) management tool for Linux and Windows. It reaches out to these endpoints over tcp/22 (SSH, for Linux and Windows — still in beta for windows) or tcp/5985 (WinRM-http) and tcp/5986 (WinRM-https). RedHat acquired Ansible in late 2015 and hasn’t done much with it since. However, the open-source community continues to develop this tool. Ansible must be run from Linux, but can manage windows as well. Ansible is written in YAML, which requires correct spacing (boo) but is otherwise generally human-readable.
AWS Systems Manager: A newer player in this field, AWS Systems Manager (SSM) is Amazon Cloud’s entry into this playing field. It’s doesn’t require hosts to be in the AWS cloud, but works better for them. It requires an agent and writing documents that define steps to take, like running scripts. This tool works by executing scripts against localhost, avoiding any network impact at all. This tool is developing all the time, very cool stuff here.
Azure: It’s worth mentioning here that Azure doesn’t have have a single tool that ties most of these “systems management” functionality together. It does have VM Extensions, the ability to run individual scripts, and much of the same functionality found in AWS SSM. However, (to my knowledge), Azure tied it all up with a pretty bow to be used together.
Due to our teams supporting many operational styles and backgrounds, we required a simple tool to operate. We also didn’t want to force teams to install a client on their machines if possible — good luck installing an agent on a Palo Alto firewall appliance, anyway. We also didn’t want to be tied into a single cloud. The dream of avoiding cloud lock-in remains just that — a dream, but our supported clients and networks are across several clouds and data centers, and intentionally locking our tooling into a single cloud (even if it somewhat supports other clouds) seemed like a bad idea.
With those criteria in mind, we picked Ansible. Let’s start at the beginning — how are we gonna run it?
Great Tools with Poor Platforms
For the Terraform project I ran we looked at several tools that’d assist us in running terraform, and none of them scaled as well as we liked. Terraform, like most command-line tools, is incredibly flexible and relies on command line switches to access that flexibility.
For whatever reason, it seems that providers in this space (I’m looking at you, HashiCorp, and you too, RedHat) provide this great command-line tools for free with open source licenses, and then attempt to leverage a platform that runs those tools that are GUI driven and can’t support Infrastructure as Code processes and scalability. It honestly baffles me, and if you understand why this poor choice has been made please let me know in the comments!
Anyway, I’m (obviously) not very pro-platform here. For terraform we leveraged Azure DevOps YAML pipelines that installed and executed terraform, bootstrapped to a cloud provider for the state file, leveraged individual account-specific IAM roles, and ran terraform commands in stages with specific flags. A huge degree of customization works very well for our business. Not to mention it scales horizontally out to hundreds of pipelines all managed via code. Good luck configuring a platform tool to do all of that.
Post-provisioning tools appear to not fit that model well — they require items that might need to be input at run-time depending on the situation. Check out the below image of a job within AWX/Tower — all these items are settings for running this job, and excludes inventory, credentials, and the ansible source script configuration, which are done elsewhere.
All these items might need to be set depending on the situation on the ground. The number of forks is how many concurrent sessions the server should make to hosts to update them. Or the “Limit” which specifies only a specific group of hosts or individual hostnames. All of that isn’t something that is ALWAYS the same — it’ll often be decided by the system engineer when running a script.
AWX/Tower offers the ability to prompt for this when the script is run, which is fantastic — hide the extra stuff and only show the person running the script the parts that matter.
Azure DevOps doesn’t yet support this well. It’s able to prompt for information but those items would need to be configured ahead of time, and encoding the logic here would be a challenge.
It’d be nice if we used a platform. Introducing RedHat’s own Ansible AWX/Tower.
AWX vs Tower
Ansible Tower is RedHat’s foray into the platform market for Ansible. It isn’t a generalized CI/CD, but is instead custom-built to run ansible scripts. It can do some cool things that we’ll cover, but first, let’s go over what’s different between AWX and Tower.
tl;dr: Not much
If you download both and get them installed you’ll see that they look and behave identically. The only way I can tell them apart is the logo in the top left (see the image at the right for their side control bars).
If you can see a difference you win a cookie, but I can’t tell.
Which makes sense because AWX is an open-source and free product that is upstream code from Tower. By upstream, I mean code is committed and tested here before it pushed to the Tower product by the RedHat team that manages Tower.
The only other major difference is AWX is deployed into containers, and if the database requires an update the installer REPLACES the database rather than updating it. This makes AWX very friendly for developers — run an installer and it can set it up in a few minutes, and very unfriendly for enterprises. This is likely the point — they want developers to be happy and enterprises to consider the paid product.
And I get it, but it also comes across as somewhat underhanded — intentionally making a product unfriendly to enterprise puts your company in a pretty poor light. Compared to HashiCorp and Terraform, I don’t think RedHat is taking great care of Ansible.
But I digress.
The other major difference that only some enterprises will care about is the ability to run Isolated Nodes — basically an Ansible “runner” that can live in a DMZ, or some other protected network, and not require the Ansible host to have connectivity to the runner.
This runner is actually built for AWX as well — it’s one of the 4 containers deployed onto the AWX host on install. However, RedHat is clear that isolated containers are not supported and won’t work on AWX (which I’m not sure about — couldn’t we register it through the Docker router? I’ll investigate that in the future.
So clearly, Tower is where it’s at, right? It supports upgrades, which is a huge win, and its updates would be a bit more stable than AWX. However, here’s where RedHat gets you. The pricing for Tower is based around the number of managed “nodes” which means any host imported into the inventory of Tower. And that price is around $110–120/host/year. Some back of the napkin math puts that total cost for 500 nodes at $55k/yr.
With the advent of microservices and tiny little cloud instances, your enterprise probably runs hundreds (if not thousands) of computers that require maintenance. Pricing this high around a host-count model makes these numbers HUGE. Especially when considering a large amount of the feature-set provided here is built into the Ansible binary itself, rather than the platform.
Conclusions
Clearly, this isn’t concluded, but it’s where I am now. I’m investigating alternate runner paths — may be a generalized CI/CD like Azure DevOps could be built-up to better handle prompt-at-runtime values? Or maybe we could use a RunDeck or Jenkins to feed values in. Those tools are also free, and don’t handicap their free products to not support upgrades like AWX.
And we haven’t even gotten to custom inventories, secrets management, and SSO/LDAP integration.
There are lots of problems to solve in this space — hundreds of unanswered questions. But that’s what engineering is — solve problems one at a time until the entire series works like magic. I’ll report back with solutions and code.
Thanks all! Good luck out there!
kyler