🔥Let’s Do DevOps: Creating 60k GitHub Auto-Link References to Jira
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all!
GitHub is a fantastic place for code, but probably isn’t where you’re managing your ticket queue, right? You could use GitHub Issues to keep track of specific issues with a code-base, but your project manager isn’t going to be happy with that — they’d much rather see an aggregate queue of work on all the different team boards in something like Atlassian’s Jira ticketing system.
So all the work comes from Jira, and is implemented in GitHub. Those ticket numbers are all over your code commits, your PR comments, and your git history.
And GitHub has a tremendously useful feature that helps you find your way back to Jira — it’s called an auto-link reference, and it means once configured on a repo, any string in any context (commit message, comment, PR message, etc.) will become a hyperlink and link back to your ticketing system! That could save a heck of a lot of time.
Unfortunately it doesn’t live within an Org configuration — it lives at the Repo level. I have 42 ticket projects to watch for, and I have 1,426 repositories to configure.
That means I need to deploy 59,892 auto-link references. Holy moly.
We’re not going to click our way through that. Let’s write some automation.
Loop Through all the Repos In Your Org
Since all these settings are at the repo level, rather than the Org level, we need to iterate through each repo. I wrote about this in a recent blog, but let’s pull out just the things we need. First, we need to count how many repos we have.
On line 2, we do an authenticated REST call to get all the Organization-level info, including Repo Type counts. We lookup the private counts (line 9) and public cout (line 10), and then add them together on line 11.
# Grab Org info to get repo counts | |
curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/orgs/$GH_ORG > org_info.json | |
# Filter org info to get repo counts | |
PRIVATE_REPO_COUNT=$(cat org_info.json | jq -r '.owned_private_repos') | |
PUBLIC_REPO_COUNT=$(cat org_info.json | jq -r '.public_repos') | |
TOTAL_REPO_COUNT=$(($PRIVATE_REPO_COUNT + $PUBLIC_REPO_COUNT)) |
Since calling all the repos is a paginated operation, we need to figure out how many pages we need to get. Since we can get 100 repos per “page” (ref the github docs), we divide the total list by 100, and if there’s any leftover, we add an extra page to make sure we get them all.
# Calculate number of pages needed to get all repos | |
REPOS_PER_PAGE=100 | |
PAGES_NEEDED=$(($TOTAL_REPO_COUNT / $REPOS_PER_PAGE)) | |
if [ $(($TOTAL_REPO_COUNT % $REPOS_PER_PAGE)) -gt 0 ]; then | |
PAGES_NEEDED=$(($PAGES_NEEDED + 1)) | |
fi |
Then we iterate over all our pages, and get each page. We write the results to a variable called ALL_REPOS
. Once this loop is done, it’ll contain a return-separated list of values, with each line a repo in our Org.
# Get all repos | |
for PAGE_NUMBER in $(seq $PAGES_NEEDED); do | |
echo "Getting repos page $PAGE_NUMBER of $PAGES_NEEDED" | |
# Could replace this with graphql call (would likely be faster, more efficient), but this works for now | |
PAGINATED_REPOS=$(curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
"https://api.github.com/orgs/$GH_ORG/repos?per_page=$REPOS_PER_PAGE&sort=pushed&page=$PAGE_NUMBER" | jq -r '.[].name') | |
# Combine all pages of repos into one variable | |
# Extra return added since last item in list doesn't have newline (would otherwise combine two repos on one line) | |
ALL_REPOS="${ALL_REPOS}"$'\n'"${PAGINATED_REPOS}" | |
done |
Then we get a list of all archived repos. This is limited to 1k — if you have more than 1k archived repos, you’ll do the same type of paginated operation we did above.
We remove the archived repos from our ALL_REPOS
list by iterating over the list of archived repos and then doing a reverse grep (find all repos except the single archived repo in this loop) on line 7.
Then we remove any extra empty line (line 12) and get a final count of all our repos on line 15.
# Find archived repos | |
ARCHIVED_REPOS=$(gh repo list $GH_ORG -L 1000 --archived | cut -d "/" -f 2 | cut -f 1) | |
ARCHIVED_REPOS_COUNT=$(echo "$ARCHIVED_REPOS" | wc -l | xargs) | |
# Remove archived repos from ALL_REPOS | |
echo "Skipping $ARCHIVED_REPOS_COUNT archived repos, they are read only" | |
for repo in $ARCHIVED_REPOS; do | |
ALL_REPOS=$(echo "$ALL_REPOS" | grep -Ev "^$repo$") | |
done | |
# Remove any empty lines | |
ALL_REPOS=$(echo "$ALL_REPOS" | awk 'NF') | |
# Get repo count | |
ALL_REPOS_COUNT=$(echo "$ALL_REPOS" | wc -l | xargs) |
And that’s all our Repos!
Creating a Single Auto-Link Reference in 1 Repo
Okay, we know all our repos, so we can iterate through them. How do we create Auto-Link References within a Repo?
To achieve this, I wrote two functions. First, we need a function to create a single auto-link reference. Let’s focus on that first. To be clear, this will create a single auto-link reference within a single repo. However, we can call it many thousands of times when we need to. Let’s make sure it works right before we do that though.
First we make sure the first argument (in bash, $1
) is populated. When you’d call this reference, that means you’d run create_auto_ref_link ticket-1234
. If you called create_auto_ref_link
without a ticket, we should error out, so on line 5, we check to make sure TICKET_REF
is populated.
# Map first argument to ticket key | |
TICKET_REF=$1 | |
# If no ticket key provided, skip | |
if [ -z "$TICKET_REF" ]; then | |
echo "☠️ No ticket key provided, skipping" | |
return 0 | |
fi |
We next attempt to create the auto-link reference by calling the REST endpoint. You’ll fill in your atlassian instance URL, as well as if your tickets are alphanumeric (TICKET-1b2d
) or just numeric (TICKET-1234
). The code below assumes numeric ticket IDs.
We trap the response in a variable called CREATE_AUTOLINK_REF
so we can examine it for errors.
# Create an auto-link reference | |
CREATE_AUTOLINK_REF=$(curl -sL \ | |
-X POST \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/repos/$GH_ORG/$GH_REPO/autolinks \ | |
-d "{\"key_prefix\":\"${TICKET_REF}-\",\"url_template\":\"https://your-atlassian-instance-name.atlassian.net/browse/${TICKET_REF}-<num>\",\"is_alphanumeric\":false}") |
First we check for some known errors I’ve run into, like the specific reference already exists (line 2) ,and if yes, we add it to an array of ticket types that, well, already exist. We’ll examine these lists later.
Then on line 7, we check for a success message. If yes, we add to ticket types library that succeeded.
And on line 12, we have our catch all if something else happened, likely an error message of some kind.
# If the auto-link reference already exists, skip | |
if [[ $(echo "$CREATE_AUTOLINK_REF" | jq -r '.errors[]?.code' | grep -E 'already_exists') ]]; then | |
#echo "☠️ Auto-link reference already exists for $TICKET_REF, skipping" | |
CREATE_AUTOLINK_REFERENCE_ALREADY_EXIST+=($TICKET_REF) | |
# If created successfully, return success | |
elif [[ $(echo "$CREATE_AUTOLINK_REF" | jq -r '.key_prefix') == "${TICKET_REF}-" ]]; then | |
#echo "💥 Successfully created auto-link reference for $TICKET_REF" | |
CREATE_AUTOLINK_REFERENCE_SUCCESSES+=($TICKET_REF) | |
# If something else happened, return detailed failure message | |
else | |
echo "☠️ Something bad happened creating auto-link reference for $TICKET_REF, please investigate response:" | |
echo "$CREATE_AUTOLINK_REF" | |
CREATE_AUTOLINK_REFERENCE_FAILURES+=($TICKET_REF) | |
fi |
That’s it for that creating 1 auto-link reference. You can call this with create_auto_link_reference TICKET
and it’ll create the Auto-Link reference in a single repo.
Creating A Lot of Auto-Link References in 1 Repo
But the goal isn’t to create 1 auto-link reference, it’s to create A LOT of auto-link references. Let’s look at the function we have that calls that function.
First we create a list of all the ticket types we want to support. You could totally do an authenticated call to Atlassian to get all the ticket types like this:
curl -s --url "https://${YOUR_ATLASSIAN_NAME}.atlassian.net/rest/api/3/project" \ | |
--header 'Accept:application/json' \ | |
--user ${USER_EMAIL}:${JIRA_API_TOKEN} | jq -r '.[].key' |
But I don’t want to do that — our instance has hundreds of ticket types that aren’t related to this Org, so I’m going to statically define it like the following. You’d fill in your own ticket types, of course.
Then we format the list to be return-separated, because I’m lazy and some of my while
loops were already written that way.
# Define the projects to create autolink references for | |
AUTOLINK_JIRA_PROJECT_KEYS=( | |
ABC | |
DEF | |
GHI | |
JKL | |
MNO | |
PQR | |
STU | |
VWY | |
ZYA | |
BCD | |
EFG | |
) | |
AUTOLINK_JIRA_PROJECT_KEYS=$(echo "${AUTOLINK_JIRA_PROJECT_KEYS[@]}" | tr ' ' '\n') |
We’ll need our parent function to iterate over this list and call our child function for each ticket type. Let’s start writing that function. First, remember this “cop” will run a lot, so idempotence and preserving API tokens is key — if we don’t need to make 20 calls for 20 tickets, to hear they all already exist, that’s great!
So let’s check which, if any, auto-link references exist. We check those on line 2 (note the []?
on the jq
filter, which means even if the set is empty (none exist), we don’t print an error.
Then on line 9, we set a list of all the project keys we want to build — which right now is all of them.
Then on line 12, if there are any existing auto-link references, we iterate through them (line 13), and do the same reverse grep match to include everything except the single one that already exists. We iterate through all the existing project keys, and our resulting list is what’s used to build our project keys. It’s entirely possible that all the keys we want to exist, do exist. Let’s check.
# Get existing auto-link references, if any | |
EXISTING_AUTOLINK_REFERENCES=$(curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/repos/$GH_ORG/$GH_REPO/autolinks | jq -r '.[]?.key_prefix' | cut -d '-' -f 1 | awk 'NF') | |
# Set array of project keys to build for this project | |
AUTOLINK_REFERENCES_TO_BUILD=$(echo "$AUTOLINK_JIRA_PROJECT_KEYS" | tr ' ' '\n') | |
# If there are existing auto-link references, remove them from the list of auto-link references to build | |
if [[ $(echo "$EXISTING_AUTOLINK_REFERENCES" | awk 'NF' | wc -l) -gt 0 ]]; then | |
while IFS=$'\n' read -r EXISTING_AUTOLINK_REFERENCE; do | |
AUTOLINK_REFERENCES_TO_BUILD=$(echo "$AUTOLINK_REFERENCES_TO_BUILD" | grep -v "$EXISTING_AUTOLINK_REFERENCE") | |
done <<< "$EXISTING_AUTOLINK_REFERENCES" | |
fi |
First let’s create some buckets to store results, for successes (line 2), already exist project keys (line 3) and failed creations (line 4). Then let’s count our two relevant lists — first the project keys that we intend to build right now (line 7) and then the length of the list of total project keys that should exist (line 8).
On line 11, we check if we’re building any project keys. Remember, after our first run the answer is no
for almost all our repos. So if we’re building any number of project keys great than 0 (-gt 0
), we iterate over our child function for each project key (line 13).
Our child function populates our libraries of succeeded, exist, and failed project key auto-link references, so let’s count all of them, lines 18–20.
# Create array to store success/failures | |
CREATE_AUTOLINK_REFERENCE_SUCCESSES=() | |
CREATE_AUTOLINK_REFERENCE_ALREADY_EXIST=() | |
CREATE_AUTOLINK_REFERENCE_FAILURES=() | |
# Count length of project keys to build and project key total to build | |
AUTOLINK_REFERENCES_TO_BUILD_LENGTH=$(echo "$AUTOLINK_REFERENCES_TO_BUILD" | awk 'NF' | wc -l | xargs) | |
AUTOLINK_JIRA_PROJECT_KEYS_LENGTH=$(echo "$AUTOLINK_JIRA_PROJECT_KEYS" | awk 'NF' | wc -l | xargs) | |
# If any auto-link references to build, loop through them, create as we go | |
if [[ $(echo "$AUTOLINK_REFERENCES_TO_BUILD" | awk 'NF' | wc -l | xargs) -gt 0 ]]; then | |
while IFS=$'\n' read -r PROJECT_KEY; do | |
create_repo_autolink_reference "$PROJECT_KEY" | |
done <<< "${AUTOLINK_REFERENCES_TO_BUILD[@]}" | |
fi | |
# Create counts vars | |
CREATE_AUTOLINK_REFERENCE_SUCCESSES_LENGTH=${#CREATE_AUTOLINK_REFERENCE_SUCCESSES[@]} | |
CREATE_AUTOLINK_REFERENCE_ALREADY_EXISTS_LENGTH=${#CREATE_AUTOLINK_REFERENCE_ALREADY_EXIST[@]} | |
CREATE_AUTOLINK_REFERENCE_FAILURES_LENGTH=${#CREATE_AUTOLINK_REFERENCE_FAILURES[@]} |
Then we print out our results. First, if the AUTLINK_REFERENCES_TO_BUILD_LENGTH
is 0, we didn’t do any work — we just made sure all the keys that should exist, do. On line 3 we report all good.
If there were any failures (line 6), we count all the failures and report both the failed, attempted, and total project key auto-links that should exist.
If neither condition is satisfied, we built some project key auto-link references and none failed, that sounds pretty positive! We report our success on line 11.
# If AUTOLINK_REFERENCES_TO_BUILD_LENGTH is 0, then all auto-link references already exist | |
if [[ $AUTOLINK_REFERENCES_TO_BUILD_LENGTH -eq 0 ]]; then | |
echo "ℹ️ All $AUTOLINK_JIRA_PROJECT_KEYS_LENGTH Jira auto-link references already exist, skipping" | |
# If there are failures, print error message | |
elif [[ $CREATE_AUTOLINK_REFERENCE_FAILURES_LENGTH -gt 0 ]]; then | |
echo "ℹ️ $CREATE_AUTOLINK_REFERENCE_SUCCESSES_LENGTH/$AUTOLINK_REFERENCES_TO_BUILD_LENGTH auto-link references created, but some failures ($CREATE_AUTOLINK_REFERENCE_FAILURES_LENGTH/$AUTOLINK_JIRA_PROJECT_KEYS_LENGTH), please investigate" | |
# If there are no failures, print success message | |
else | |
echo "💥 Successfully created $CREATE_AUTOLINK_REFERENCE_SUCCESSES_LENGTH auto-link reference for all $AUTOLINK_JIRA_PROJECT_KEYS_LENGTH configured Jira project keys" | |
fi |
Build 60k Auto-Links
Actually building all those auto-links is pretty boring now. We establish a loop over every repo (that’s not archived, the list we built in the beginning), and call our parent function.
This function, on line 7, iterates over every auto-link reference that should exist, and if it doesn’t, builds it. For every repo in your Org.
while IFS=$'\n' read -r GH_REPO; do | |
### | |
### Create Repo AutoLink References | |
### | |
# Create repo autolink references to connect ticket strings to Jira tickets via hyperlinks | |
create_repo_autolink_references | |
done <<< "$ALL_REPOS" |
There are some limitations I encountered and solved, which will be coming in articles soon:
A normal PAT from GitHub gets 5k activities per hour. The host I ran all this on was operating on about 15k activities per hour, and was hitting frequent API limits. I wrote a circuit breaker to test if our API token bucket fell too low, and to wait out a token refill. I’ll publish more on that next week!
Running all this activity on 1 host in serial is kinda slow. It takes about 20 minutes to empty a token bucket each time it’s refilled. If we have more builders, we can empty that token bucket quicker. I updated the GitHub Action this all runs on to be sharded across 2 builders — I initially tried 4, but ran into different API limitations. Also coming soon!
If each builder had their own API token bucket, or several to choose from, this could operate 2–8x faster. I’m pending GitHub to confirm if this breaks their ToS. If not, I’ll do that and write it up for ya : )
Summary
In this blog, we talked through how to get a list of all repos, even when you have thousands, and then how to remove all the (read-only) archived ones.
We iterate over those repos, and call a function we call create_repo_autolink_references
(the parent function). It checks to see which auto-link references already exist, and compares that against a list of which ones should exist, and then attempts to build all of them by calling the create_repo_autolink_reference
function that actually builds them all.
And that’s how I built 60k repo auto-link references. Here’s a link to all the code aggregated on a single gist.
Good luck out there!
kyler