🔥Let’s Do DevOps: Add All GitHub Repos with Specific Topic to GitHub App🚀
AKA, I don't want to add 286 repos to a GitHub app via click-ops
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all!
I recently helped implement a GitHub App at my Org that needed to have access to several hundred repos - every Repo with a particular Topic assigned to it, in fact. Topics are a way to label repos, and you’ll find that your developers are probably already taking advantage of Topics to label their Repos.
Given that Topics are native GitHub functionality for grouping Repos, when you have a GitHub App that requires access to all the Repos that have a particular Topic, it’ll be easy to do in the GUI, right?
The GUI for GitHub Apps has two choices - either you give the App access to ALL Repos, or you give it only access to selected repos. Clearly, the second option is a better choice for security, right? That mode permits you to check the box next to the repos you want to add to the GitHub App.
And that’s great, but it doesn’t let me filter for Topics. Or even see them from this view. And I really don’t love the idea of clicking 286 times anyway.
So let’s write a bash script to do it. I <3 <3 bash because it’s very quick to mock up tools and test them.
Here’s what the finished product looks like:
Let’s walk through how the script works. If you don’t care, and just want to skip right to the source so you can do it yourself, scroll to the bottom of this article for a link to the GitHub Repo where it’s shared at.
Set Required Inputs
So we don’t want to hard-code a bunch of values that’ll change at different GitHub Orgs, like in your own env. Let’s walk through which ones we need to set for it to run.
GITHUB_TOKEN - Set this to a GitHub Token that has the ability to read all Repo info, and write the GitHub App we’ll update
GITHUB_APP_INSTALLATION_ID - The numerical ID of the App you want to add Repos to. You can easily find this in the URL when you navigate to the App’s install page in your GitHub Org info
https://github.com/organizations/org_name/settings/installations/1234567890
GH_ORG - The name of your GitHub Org, all lower-cased
https://github.com/org_name
TOPIC - The single topic that you want to filter Repos for
If you’ll copy and paste all code, you should set all vars before running, like this:
GITHUB_TOKEN=ghp_abcdef
GITHUB_APP_INSTALLATION_ID=1234567890
GH_ORG=gh_org
TOPIC=name-of-topic
And if you’ll copy down the script to your machine and execute the script, make sure to export the variables:
export GITHUB_TOKEN=ghp_abcdef
export GITHUB_APP_INSTALLATION_ID=1234567890
export GH_ORG=gh_org
export TOPIC=name-of-topic
Validate Required Inputs
Let’s walk through the script. First, we need to establish this as a bash file, line 1.
Then on line 4 we have a series of checks to see if any of these variables are unset (-z). If they are, we exit 0 with an error. If you see these errors, make sure you’re setting and/or exporting the correct variables for this script to run.
#!/bin/bash | |
# Check for required variables to be set, and if not present, exit 0 | |
if [ -z "$GITHUB_TOKEN" ] || [ -z "$GITHUB_APP_INSTALLATION_ID" ] || [ -z "$GH_ORG" ] || [ -z "$TOPIC" ]; then | |
echo "One or more required variables not set, exiting" | |
exit 0 | |
fi |
Declare Functions
First, we have a function that checks our API token wallet, and will hold until that wallet has more than 100 tokens in it. Tokens fill up with 5k tokens every hour, so unless you have a HUGE amount of repos to process, you won’t exhaust your token budget.
hold_until_rate_limit_success() { | |
# Loop forever | |
while true; do | |
# Any call to AWS returns rate limits in the response headers | |
API_RATE_LIMIT_UNITS_REMAINING=$(curl -sv \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN" \ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/repos/$GH_ORG/$GH_REPO/autolinks 2>&1 1>/dev/null \ | |
| grep -E '< x-ratelimit-remaining' \ | |
| cut -d ' ' -f 3 \ | |
| xargs \ | |
| tr -d '\r') | |
# If API rate-limiting is hit, sleep for 1 minute | |
if [[ "$API_RATE_LIMIT_UNITS_REMAINING" < 100 ]]; then | |
echo "ℹ️ We have less than 100 GitHub API rate-limit tokens left, sleeping for 1 minute" | |
sleep 60 | |
# If API rate-limiting shows remaining units, break out of loop and exit function | |
else | |
echo ℹ️ Rate limit checked, we have "$API_RATE_LIMIT_UNITS_REMAINING" core tokens remaining so we are continuing | |
break | |
fi | |
done | |
} |
Next we have a HUGE function that gets all the repos across our Org (with support for pagination, which can get any number of repos, even if that number is thousands or tens of thousands), removes any archived repos (line 48-50), and stores that all the repos in a var (line 53), and counts them up for iterating purposes (line 56).
Line 35 is doing some amazing stuff, look at it closely - we’re getting a whole bunch of repos and and all their info - probably ~100 repos, and then we’re doing a select to find the array of Topics the repos have, and selecting only the json nodes that contain the Topic we passed to it, then we’re filtering those nodes down to just their top-level key called “name”. This is doing magic - in groups of 100-ish repos, we’re finding just the repos we want, and getting their names.
I <3 jq
(curl for repo page json package) | jq -r ".[] | select(.topics[] | contains(\"$TOPIC\")) | .name")
This doesn’t strictly need to be a function since we only call it once, but I find it helps keep our scripts clean to store it as one.
get_org_repos() { | |
### | |
### Now that we have more than 1k repos, need to use paginated REST call to get all of them (search API hard limit of 1k) | |
### | |
# Grab Org info to get repo counts | |
ORG_INFO=$(curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/orgs/$GH_ORG) | |
# Filter org info to get repo counts | |
PRIVATE_REPO_COUNT=$(echo $ORG_INFO | jq -r '.owned_private_repos') | |
PUBLIC_REPO_COUNT=$(echo $ORG_INFO | jq -r '.public_repos') | |
TOTAL_REPO_COUNT=$(($PRIVATE_REPO_COUNT + $PUBLIC_REPO_COUNT)) | |
# Calculate number of pages needed to get all repos | |
REPOS_PER_PAGE=100 | |
PAGES_NEEDED=$(($TOTAL_REPO_COUNT / $REPOS_PER_PAGE)) | |
if [ $(($TOTAL_REPO_COUNT % $REPOS_PER_PAGE)) -gt 0 ]; then | |
PAGES_NEEDED=$(($PAGES_NEEDED + 1)) | |
fi | |
# Get all repos | |
for PAGE_NUMBER in $(seq $PAGES_NEEDED); do | |
echo "Getting repos page $PAGE_NUMBER of $PAGES_NEEDED" | |
# Could replace this with graphql call (would likely be faster, more efficient), but this works for now | |
PAGINATED_REPOS=$(curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN"\ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
"https://api.github.com/orgs/$GH_ORG/repos?per_page=$REPOS_PER_PAGE&sort=pushed&page=$PAGE_NUMBER" | jq -r ".[] | select(.topics[] | contains(\"$TOPIC\")) | .name") | |
# Combine all pages of repos into one variable | |
# Extra return added since last item in list doesn't have newline (would otherwise combine two repos on one line) | |
ALL_REPOS="${ALL_REPOS}"$'\n'"${PAGINATED_REPOS}" | |
done | |
# Find archived repos | |
ARCHIVED_REPOS=$(gh repo list $GH_ORG -L 1000 --archived | cut -d "/" -f 2 | cut -f 1) | |
ARCHIVED_REPOS_COUNT=$(echo "$ARCHIVED_REPOS" | wc -l | xargs) | |
# Remove archived repos from ALL_REPOS | |
echo "Skipping $ARCHIVED_REPOS_COUNT archived repos, they are read only" | |
for repo in $ARCHIVED_REPOS; do | |
ALL_REPOS=$(echo "$ALL_REPOS" | grep -Ev "^$repo$") | |
done | |
# Remove any empty lines | |
ALL_REPOS=$(echo "$ALL_REPOS" | awk 'NF') | |
# Get repo count | |
ALL_REPOS_COUNT=$(echo "$ALL_REPOS" | wc -l | xargs) | |
} |
Stage What We Need
Next we unsurprisingly call our functions to first check our rate-limit before getting started (line 4). This will hold if our token budget is exhausted until it’s refilled.
Then we get all our org repos, line 15, and store in a var.
Then on lines 18-22, we print a header that says how many repos we’re going to iterate through. I’ve found long-running scripts seem very suspicious if I’m not printing out lines that change for each repo, so either printing their name or number. We’ll talk about that shortly.
### | |
### Hold any actions until we confirm not rate-limited | |
### | |
hold_until_rate_limit_success | |
### | |
### Get Org-wide info | |
### | |
echo "########################################" | |
echo Getting All Org Repos | |
echo "########################################" | |
get_org_repos | |
# Add all repos in list to the GitHub App | |
echo "" | |
echo "########################################" | |
echo "Iterating through $ALL_REPOS_COUNT repos" | |
echo "########################################" | |
echo "" |
When I run it, it looks like this:
ℹ️ Rate limit checked, we have 4022 core tokens remaining so we are continuing
########################################
Getting All Org Repos
########################################
Getting repos page 1 of 15
Getting repos page 2 of 15
Getting repos page 3 of 15
Getting repos page 4 of 15
Getting repos page 5 of 15
Getting repos page 6 of 15
Getting repos page 7 of 15
Getting repos page 8 of 15
Getting repos page 9 of 15
Getting repos page 10 of 15
Getting repos page 11 of 15
Getting repos page 12 of 15
Getting repos page 13 of 15
Getting repos page 14 of 15
Getting repos page 15 of 15
Skipping 20 archived repos, they are read only
########################################
Iterating through 286 repos
########################################
One last important piece - set a counter var. We’ll start at 0, and increment at the beginning of our loop.
# Initialize counter var to keep track of repo processing | |
CURRENT_REPO_COUNT=0 |
Let’s Add Some Repos aka The Big Loooooooop
We’re going to start a giant (well, a few dozen lines) loop, so let’s first loop at the loop construct.
We’ll do a while loop that terminate when it reaches the end of what it’s iterating over. And what it’s iterating over is on line 4, the $ALL_REPOS var.
On line 2 you can see we set IFS=$’\n’, which means that for each new line character, treat that as a single thing to iterate over. Bash natively stores lists with a newline character, so that’s perfect for iterating.
Also on line 4, we `read -r GH_REPO` which means read the input variable as the new variable `GH_REPO`, which will be set to the name of the repo on each iteration.
# Iterate over all repos | |
while IFS=$'\n' read -r GH_REPO; do | |
# smart stuff goes in here | |
done <<< "$ALL_REPOS" |
Okay, now we’re going to go into the loop!
On line 5, we immediately check our token budget. If we’re good, continue, if not, loop until we are good again.
On line 8, we increment our counter - we started at 0, so on first iteration it’ll show the CURRENT_REPO_COUNT=1, and it’ll increment higher numbers on each loop.
On line 11 we do a curl to GitHub’s REST endpoint to get a single repo’s information - the repo we’re working on. What we need here is the “id” attribute of the repo - it’s used on the next REST call, where we add the individual Repo to the GitHub app.
Note we get the entire repo package info, and then we use `jq` to filter the response down to just `.id`, which means the top-level json key called “id”’s value.
# Echo out some spaces | |
echo "###################" | |
# Hold until rate limit is not hit | |
hold_until_rate_limit_success | |
# Increment counter | |
CURRENT_REPO_COUNT=$((CURRENT_REPO_COUNT + 1)) | |
# Find github repo ID with REST call | |
GH_REPO_ID=$(curl -sL \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN" \ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
https://api.github.com/repos/$GH_ORG/$GH_REPO | jq -r '.id') |
Next up, we actually attempt to add the repo to the github action. On line 2, we do a very similar curl to the GitHub REST endpoint for the github app to PUT a repo ID in it, which REST interprets as “add a repo to this github app”. Note the “2>&1” at the end, which means to store any error messages (stderr goes to the 2 output) in the variable.
That’s important because we check on line 10 if there is a response error, such as “Not Found”, which indicates the add didn’t work. If that happens, we catch the error and print the full error output for a human to investigate. Probably the GitHub App ID is wrong, or the permissions aren’t enough to add the Repo to the GitHub App.
If it works, we print a happy message.
unset CURL | |
CURL=$(curl -L \ | |
-X PUT \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN" \ | |
-H "X-GitHub-Api-Version: 2022-11-28" \ | |
"https://api.github.com/user/installations/$GITHUB_APP_INSTALLATION_ID/repositories/$GH_REPO_ID" 2>&1 \ | |
) | |
# Check for errors | |
if [[ $(echo "$CURL" | grep -E 'Not Found') ]]; then | |
echo "☠️ Something bad happened adding $GH_REPO to Gitub App, please investigate response:" | |
echo "$CURL" | |
else | |
echo "💥 Successfully added $GH_REPO ($CURRENT_REPO_COUNT/$ALL_REPOS_COUNT) to GitHub App w/ ID $GITHUB_APP_INSTALLATION_ID" | |
fi | |
echo "" |
When I run the loop it looks like this:
########################################
Iterating through 285 repos
########################################
###################
ℹ️ Rate limit checked, we have 4938 core tokens remaining so we are continuing
💥 Successfully added Repo1 (1/285) to GitHub App w/ ID 1234567890
###################
ℹ️ Rate limit checked, we have 4935 core tokens remaining so we are continuing
💥 Successfully added Repo2 (2/285) to GitHub App w/ ID 1234567890
###################
ℹ️ Rate limit checked, we have 4932 core tokens remaining so we are continuing
💥 Successfully added Repo3 (3/285) to GitHub App w/ ID 1234567890
And at the very end, we profit.
echo "###################" | |
echo "Run complete!" | |
echo "###################" |
Summary
In this write-up we talked about the limitations of the GitHub App UI, and how this is possible to manage manually, but it’d take several hundred clicks and potentially hours to do. Plus it’d be pretty manual error prone.
Then we walked through a script that does exactly what we need to do - find all the repos, then filter for just those with the tag we want, and then store their names in a variable. We loop over that list of repos, find each repo’s ID, and then PUT that repo into the list of repos the GitHub App is targeted at.
All the code we talked about can be found here, ready to run!
Good luck out there!
kyler