🔥Let’s Do DevOps: Commit Regex Validation with GitHub Actions

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…

Oct 07, 2021

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all!

I’m helping a team I work with migrate from Atlassian’s Stash/Bitbucket to GitHub. GitHub has a ton of cool features, and is obviously the best choice for hosting enterprise CI/CD (if you disagree, let me know in comments!). However, there are some features that our team loves in Stash that aren’t yet present in GitHub.

One of the most important for the company I’m working with, who operate in the US regulated healthcare market, is that each commit for source code correlate to a Jira ticket number. Stash, also being an Atlassian product like Jira, has easy and extensive integration. GitHub does have some integration, but commit validation isn’t one of them.

Not for lack of trying though. Atlassian would love to build this feature. The problem is on GitHub’s side. Let’s first talk about how Stash implements commit validation

Commit Validation

Stash Version

The way Stash implements this commit validation is called a pre-commit hook, which means that when you attempt to push to a repo, a process runs which checks each commit you’ve made. If any don’t align to a Jira ticket, your git push is rejected, and you must fix your issues first.

GitHub Version

GitHub doesn’t support pre-commit hooks. Their argument is that they’re a public service (vs private Stash server), so if they operated a compute-heavy pre-commit hook on a public endpoint, they might get DoS’d to oblivion, so they’re being very slow and careful to implement something like that.

They do support a post-commit hook, which means that you can write a GitHub Action that reads the git tree, and can be made a blocker for PRs. So in theory, it would work the same way, but it would be after a commit is pushed to the server, rather than before.

Our enterprise GitHub support offered up this help when we asked: “Write it yourself,” and linked us to a repo of similar code. This existing code doesn’t integrate with Jira, but rather uses regex to match good signatures for commits. That’s not perfect, but it’s a great start, so that’s what we built.

There is one other problem that I’ll address in the future (I’m still working on solving it!), which is that in Stash, you can set server-wide (or project-wide) policies for pre-commit validation. There’s no such feature on GitHub — each repo operates as an island, so if you want an Action to exist in each repo, you need to open a PR against every one of those repos, merge the repo, then set a branch protection policy referencing your new Action. When we asked our GitHub support for assistance building this, they again offered their standby, “Build it yourself.” I’ll address this when I solve it in a future article!

Git Principles

So we need to do some git magic using a post-commit GitHub Action. We’ll write that together, but first we need to cover some git principles so what we’re doing makes sense. Let’s cover some definitions:

branch — A collection of git history metadata. Branches can be created by anyone, and are commonly created off of the “main” or “master” branch to do some work
pull request — A pull request is a request to merge a branch into the main or master branch. Usually code is tested and peer approved before “merge” where the working branch is destroyed and the main branch is updated
git checkout — Git clients can checkout the code and metadata from a git server on any branch. Clients commonly check out the main branch, and then create a working branch on their local host
git add — Tells git to track a file
git commit — Packages the tracked files (ones tracked with git add) into the current branch
git push — Pushes the files and metadata from the client to the remote git host
git tree — Git permits viewing historical commit and merge data

Bash Scripting

GitHub Actions are able to execute all sorts of code. My personal favorite programming language is bash. It’s not perfect — it has a lot of trouble with quoting, and doesn’t have the depth of Python or Java, or the cross-compilation ability of Go, but it’s easy to read and use, great for utilities like our commit checker.

Let’s start writing our github action in bash.

Our first step is to checkout the master branch with a fetch-depth of 0. This tells our Action that we want to download ALL the metdata, not just the most recent copy of the files in the branch we’re working with. This is slower, but necessary for computing git differences.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	- uses: actions/checkout@v2
	with:
	fetch-depth: 0
	ref: '${{ github.event.pull_request.base.ref }}'

view raw git_action_commit_checker_checkout.yml hosted with ❤ by GitHub

Next we checkout the “HEAD” of our working branch. This is an unusual required step in an Action’s context — we are required to do this so we can get all the working branch’s git metadata.

Show hidden characters

	# Checkout branch
	git checkout -q ${{ github.event.pull_request.head.ref }}

view raw commit_checker_checkout_head.sh hosted with ❤ by GitHub

Next we set a few variables — we set the git commit reference of our base (master branch) to variable BASE_BRANCH. This is the ref we’re branching off of.

Next we set a regex string that’s looking for AAA or BBB or CCC and then a dash, and then any number of numbers. That’s basically how a Jira ticket looks. You can add any number of Jira ticket types here, or any other regex you want. Notably, ticket types should be all caps here — we don’t require users to do that — in fact later, we’ll capitalize the commit string before we check it, to permit any variation in caps to be ignored without complicating our regex string.

Show hidden characters

	# Set variables
	BASE_BRANCH=${{ github.event.pull_request.base.ref }}
	msg_regex='(AAA\|BBB\|CCC)\-[0-9]+'

view raw commit_checker_set_vars.sh hosted with ❤ by GitHub

Next we initialize a tracking variable, invalidCommit to false. If we evaluate every commit and don’t set this value to true, we’re happy, and can give a green light to our PR to merge.

Show hidden characters

	# Initialize invalidCommit as false, will be set to true by any invalid commits
	invalidCommit=false

view raw commit_checker_tracking_var_invalid_commits.sh hosted with ❤ by GitHub

Next we initialize CURRENT_BRANCH and set it to the name of our current branch. We’ll need this to find the common ancestor in a second.

Show hidden characters

	# Find current branch name
	CURRENT_BRANCH=$(git branch \| grep ^\* \| cut -d "*" -f 2 \| cut -d " " -f 2)

view raw commit_checker_current_branch_var.sh hosted with ❤ by GitHub

We don’t want to check every commit in the git’s history — there could literally be tens of thousands, and certainly we don’t want each engineer to need to fix the entire history of git to merge their branches.

Therefore, we only want to check the commits since the branch has been created, which is hopefully only a few dozen at most. If it’s more that’s not an issue — we just want a reasonable number that are in the developer’s ownership.

To do that we have to find our common merge base — the first commit shared between the main and our working branch. Git works backwards from the current commit and checks each one to find the first which matches the main branch.

Show hidden characters

	# Find hash of commit most common ancestor, e.g. where branch began
	BRANCH_MERGE_BASE=$(git merge-base ${BASE_BRANCH} ${CURRENT_BRANCH})

view raw commit_checker_branch_merge_base.sh hosted with ❤ by GitHub

Now that we have a common ancestor commit, and the most recent commit (the HEAD of our working branch), we have a FROM and TO of commits. We tell git to give us a list of commits between. This is an array that we can iterate over to check on.

Show hidden characters

	# Find all commits since common ancestor
	BRANCH_COMMITS=$(git rev-list ${BRANCH_MERGE_BASE}..HEAD)

view raw commit_checker_branch_commits.sh hosted with ❤ by GitHub

There’s a lot going on here, so let’s take each command individually. First, we start a for loop over the array of commits since our branch was created.

We use git log to find each commit message ( — format=%B is the commit message only), then pipe the output to tr, a tool for transforming string, to convert all characters to capital letters (this lets our devs not worry about exactly matching our regex capitalization), then we use grep to check our string against the regex variable we set earlier.

We either get a true on that if statement, in which case we do nothing, and iterate over the rest of the commits, or we get a false, which means our grep regex isn’t satisfied, and the commit doesn’t pass muster. We print out relevant debug information of the commit hash and message so the dev knows which commit isn’t permissible. We also set invalidCommit to true, which we’ll evaluate after we’ve looped over each commit.

A benefit of not immediately breaking from this script on a failed commit is we can evaluate every commit, and print out ALL the offending commits. The more information we can give our dev teams, the better.

Show hidden characters

	# Check every commit message since ancestor for regex match
	for commit in $BRANCH_COMMITS; do
	if git log --max-count=1 --format=%B $commit \| tr '[a-z]' '[A-Z]' \| grep -iqE "$msg_regex"; then
	: #If commit matches regex, commit is valid, do nothing
	else
	# If commit doesn't match regex, commit isn't valid, print commit info
	echo "************"
	printf "Invalid commit message: \"%s\" and hash: %s\n" "$(git log --max-count=1 --format=%B $commit)" "$commit"
	echo "************"

	# Set this variable to trigger rejection if any commit fails regex
	invalidCommit=true
	fi
	done

view raw commit_checker_check_each_commit.sh hosted with ❤ by GitHub

After we’ve finished evaluating each commit message against regex, we check our canary variable — if it’s been set to true by any of the commit checking, we indicate to our devs that their push has been rejected, and provide some info on rewriting (aka “squashing”) their history into a valid commit string, and exit with code 1 which Actions will interpret as a critical failure.

If however, invalidCommit is still set to false, then no commits indicated a problem, and our devs have done a wonderful job meeting commit requirements, so we exit with code 0, which Actions interprets as a success.

Show hidden characters

	# If any commit are invalid, print reject message
	if [ "$invalidCommit" == true ]; then
	echo "Your push was rejected because at least one commit message on this branch is invalid"
	echo "Please fix the commit message(s) and push again."
	echo "https://help.github.com/en/articles/changing-a-commit-message"
	echo "************"
	exit 1
	elif [ "$invalidCommit" == false ]; then
	echo "************"
	echo "All commits are valid"
	echo "************"
	exit 0
	fi

view raw commit_checker_exit_message.sh hosted with ❤ by GitHub

Convert to an Action

This step is easy! We take the script we’ve built and plop it into a run statement, and place everything in this directory: .github/workflows/CommitChecker.yml.

https://github.com/KyMidd/GitHubAction_CommitChecker

Branch Protections

Boom, you’re done! Well, almost. Your Action will now run on each PR commit (perfect!) but won’t be mandatory for PRs to move forward (boo!). However, that’s easy to resolve.

In your repo, go to Settings → Branches → and then Add rule or Edit a Branch protection rule on any branch for which this Action should be mandatory.

Find the Protect matching branches section and check the box next to Require status checks to pass before merging. Then search for our Action — if you used my template exactly, it’ll be called Commit_Checker.

Hit save, and this Action is now required to pass for a PR to qualify for merging.

Summary

In this article we learned a lot about git, and how it tracks code, we built a bash script that can find a common ancestor in a git branch, as well as a list of commits since, and we implemented regex matching on each commit to make sure those commits matched our standard.

We even set this commit matching to be mandatory for a PR to be merged in our repo. Congratulations!

The next step, of course, will be to implement this en-masse. For instance, say you have 200 repos, how do you implement this on all of them? Well, I’m working on it! Look for that article soon.

Thanks all! Good luck out there.
kyler