🔥Let’s Do DevOps: Commit Regex Validation with GitHub Actions
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all!
I’m helping a team I work with migrate from Atlassian’s Stash/Bitbucket to GitHub. GitHub has a ton of cool features, and is obviously the best choice for hosting enterprise CI/CD (if you disagree, let me know in comments!). However, there are some features that our team loves in Stash that aren’t yet present in GitHub.
One of the most important for the company I’m working with, who operate in the US regulated healthcare market, is that each commit for source code correlate to a Jira ticket number. Stash, also being an Atlassian product like Jira, has easy and extensive integration. GitHub does have some integration, but commit validation isn’t one of them.
Not for lack of trying though. Atlassian would love to build this feature. The problem is on GitHub’s side. Let’s first talk about how Stash implements commit validation
Commit Validation
Stash Version
The way Stash implements this commit validation is called a pre-commit hook
, which means that when you attempt to push to a repo, a process runs which checks each commit you’ve made. If any don’t align to a Jira ticket, your git push
is rejected, and you must fix your issues first.
GitHub Version
GitHub doesn’t support pre-commit hooks. Their argument is that they’re a public service (vs private Stash server), so if they operated a compute-heavy pre-commit hook on a public endpoint, they might get DoS’d to oblivion, so they’re being very slow and careful to implement something like that.
They do support a post-commit hook, which means that you can write a GitHub Action that reads the git tree, and can be made a blocker for PRs. So in theory, it would work the same way, but it would be after a commit is pushed to the server, rather than before.
Our enterprise GitHub support offered up this help when we asked: “Write it yourself,” and linked us to a repo of similar code. This existing code doesn’t integrate with Jira, but rather uses regex to match good signatures for commits. That’s not perfect, but it’s a great start, so that’s what we built.
There is one other problem that I’ll address in the future (I’m still working on solving it!), which is that in Stash, you can set server-wide (or project-wide) policies for pre-commit validation. There’s no such feature on GitHub — each repo operates as an island, so if you want an Action to exist in each repo, you need to open a PR against every one of those repos, merge the repo, then set a branch protection policy referencing your new Action. When we asked our GitHub support for assistance building this, they again offered their standby, “Build it yourself.” I’ll address this when I solve it in a future article!
Git Principles
So we need to do some git magic using a post-commit GitHub Action. We’ll write that together, but first we need to cover some git principles so what we’re doing makes sense. Let’s cover some definitions:
branch — A collection of git history metadata. Branches can be created by anyone, and are commonly created off of the “main” or “master” branch to do some work
pull request — A pull request is a request to merge a branch into the main or master branch. Usually code is tested and peer approved before “merge” where the working branch is destroyed and the main branch is updated
git checkout — Git clients can checkout the code and metadata from a git server on any branch. Clients commonly check out the main branch, and then create a working branch on their local host
git add — Tells git to track a file
git commit — Packages the tracked files (ones tracked with
git add
) into the current branchgit push — Pushes the files and metadata from the client to the remote git host
git tree — Git permits viewing historical commit and merge data
Bash Scripting
GitHub Actions are able to execute all sorts of code. My personal favorite programming language is bash. It’s not perfect — it has a lot of trouble with quoting, and doesn’t have the depth of Python or Java, or the cross-compilation ability of Go, but it’s easy to read and use, great for utilities like our commit checker.
Let’s start writing our github action in bash.
Our first step is to checkout the master branch with a fetch-depth of 0. This tells our Action that we want to download ALL the metdata, not just the most recent copy of the files in the branch we’re working with. This is slower, but necessary for computing git differences.
- uses: actions/checkout@v2 | |
with: | |
fetch-depth: 0 | |
ref: '${{ github.event.pull_request.base.ref }}' |
Next we checkout the “HEAD” of our working branch. This is an unusual required step in an Action’s context — we are required to do this so we can get all the working branch’s git metadata.
# Checkout branch | |
git checkout -q ${{ github.event.pull_request.head.ref }} |
Next we set a few variables — we set the git commit reference of our base (master branch) to variable BASE_BRANCH
. This is the ref we’re branching off of.
Next we set a regex string that’s looking for AAA or BBB or CCC and then a dash, and then any number of numbers. That’s basically how a Jira ticket looks. You can add any number of Jira ticket types here, or any other regex you want. Notably, ticket types should be all caps here — we don’t require users to do that — in fact later, we’ll capitalize the commit string before we check it, to permit any variation in caps to be ignored without complicating our regex string.
# Set variables | |
BASE_BRANCH=${{ github.event.pull_request.base.ref }} | |
msg_regex='(AAA|BBB|CCC)\-[0-9]+' |
Next we initialize a tracking variable, invalidCommit
to false. If we evaluate every commit and don’t set this value to true, we’re happy, and can give a green light to our PR to merge.
# Initialize invalidCommit as false, will be set to true by any invalid commits | |
invalidCommit=false |
Next we initialize CURRENT_BRANCH
and set it to the name of our current branch. We’ll need this to find the common ancestor in a second.
# Find current branch name | |
CURRENT_BRANCH=$(git branch | grep ^\* | cut -d "*" -f 2 | cut -d " " -f 2) |
We don’t want to check every
commit in the git’s history — there could literally be tens of thousands, and certainly we don’t want each engineer to need to fix the entire history of git to merge their branches.
Therefore, we only want to check the commits since the branch has been created, which is hopefully only a few dozen at most. If it’s more that’s not an issue — we just want a reasonable number that are in the developer’s ownership.
To do that we have to find our common merge base — the first commit shared between the main and our working branch. Git works backwards from the current commit and checks each one to find the first which matches the main branch.
# Find hash of commit most common ancestor, e.g. where branch began | |
BRANCH_MERGE_BASE=$(git merge-base ${BASE_BRANCH} ${CURRENT_BRANCH}) |
Now that we have a common ancestor commit, and the most recent commit (the HEAD of our working branch), we have a FROM and TO of commits. We tell git to give us a list of commits between. This is an array that we can iterate over to check on.
# Find all commits since common ancestor | |
BRANCH_COMMITS=$(git rev-list ${BRANCH_MERGE_BASE}..HEAD) |
There’s a lot going on here, so let’s take each command individually. First, we start a for
loop over the array of commits since our branch was created.
We use git log
to find each commit message ( — format=%B is the commit message only), then pipe the output to tr
, a tool for tr
ansforming string, to convert all characters to capital letters (this lets our devs not worry about exactly matching our regex capitalization), then we use grep
to check our string against the regex variable we set earlier.
We either get a true
on that if statement, in which case we do nothing, and iterate over the rest of the commits, or we get a false
, which means our grep regex
isn’t satisfied, and the commit doesn’t pass muster. We print out relevant debug information of the commit hash and message so the dev knows which commit isn’t permissible. We also set invalidCommit
to true
, which we’ll evaluate after we’ve looped over each commit.
A benefit of not immediately breaking from this script on a failed commit is we can evaluate every commit, and print out ALL the offending commits. The more information we can give our dev teams, the better.
# Check every commit message since ancestor for regex match | |
for commit in $BRANCH_COMMITS; do | |
if git log --max-count=1 --format=%B $commit | tr '[a-z]' '[A-Z]' | grep -iqE "$msg_regex"; then | |
: #If commit matches regex, commit is valid, do nothing | |
else | |
# If commit doesn't match regex, commit isn't valid, print commit info | |
echo "************" | |
printf "Invalid commit message: \"%s\" and hash: %s\n" "$(git log --max-count=1 --format=%B $commit)" "$commit" | |
echo "************" | |
# Set this variable to trigger rejection if any commit fails regex | |
invalidCommit=true | |
fi | |
done |
After we’ve finished evaluating each commit message against regex, we check our canary variable — if it’s been set to true
by any of the commit checking, we indicate to our devs that their push has been rejected, and provide some info on rewriting (aka “squashing”) their history into a valid commit string, and exit with code 1
which Actions will interpret as a critical failure.
If however, invalidCommit
is still set to false
, then no commits indicated a problem, and our devs have done a wonderful job meeting commit requirements, so we exit with code 0
, which Actions interprets as a success.
# If any commit are invalid, print reject message | |
if [ "$invalidCommit" == true ]; then | |
echo "Your push was rejected because at least one commit message on this branch is invalid" | |
echo "Please fix the commit message(s) and push again." | |
echo "https://help.github.com/en/articles/changing-a-commit-message" | |
echo "************" | |
exit 1 | |
elif [ "$invalidCommit" == false ]; then | |
echo "************" | |
echo "All commits are valid" | |
echo "************" | |
exit 0 | |
fi |
Convert to an Action
This step is easy! We take the script we’ve built and plop it into a run
statement, and place everything in this directory: .github/workflows/CommitChecker.yml.
Branch Protections
Boom, you’re done! Well, almost. Your Action will now run on each PR commit (perfect!) but won’t be mandatory for PRs to move forward (boo!). However, that’s easy to resolve.
In your repo, go to Settings
→ Branches
→ and then Add rule or Edit
a Branch protection rule
on any branch for which this Action should be mandatory.
Find the Protect matching branches
section and check the box next to Require status checks to pass before merging
. Then search for our Action — if you used my template exactly, it’ll be called Commit_Checker
.
Hit save, and this Action is now required to pass for a PR to qualify for merging.
Summary
In this article we learned a lot about git, and how it tracks code, we built a bash script that can find a common ancestor in a git branch, as well as a list of commits since, and we implemented regex matching on each commit to make sure those commits matched our standard.
We even set this commit matching to be mandatory for a PR to be merged in our repo. Congratulations!
The next step, of course, will be to implement this en-masse. For instance, say you have 200 repos, how do you implement this on all of them? Well, I’m working on it! Look for that article soon.
Thanks all! Good luck out there.
kyler