🔥Let’s Do DevOps: Add All GitHub Repos with Specific Topic to GitHub App🚀

AKA, I don't want to add 286 repos to a GitHub app via click-ops

Jul 30, 2024

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all!

I recently helped implement a GitHub App at my Org that needed to have access to several hundred repos - every Repo with a particular Topic assigned to it, in fact. Topics are a way to label repos, and you’ll find that your developers are probably already taking advantage of Topics to label their Repos.

Given that Topics are native GitHub functionality for grouping Repos, when you have a GitHub App that requires access to all the Repos that have a particular Topic, it’ll be easy to do in the GUI, right?

The GUI for GitHub Apps has two choices - either you give the App access to ALL Repos, or you give it only access to selected repos. Clearly, the second option is a better choice for security, right? That mode permits you to check the box next to the repos you want to add to the GitHub App.

And that’s great, but it doesn’t let me filter for Topics. Or even see them from this view. And I really don’t love the idea of clicking 286 times anyway.

So let’s write a bash script to do it. I <3 <3 bash because it’s very quick to mock up tools and test them.

Here’s what the finished product looks like:

Let’s walk through how the script works. If you don’t care, and just want to skip right to the source so you can do it yourself, scroll to the bottom of this article for a link to the GitHub Repo where it’s shared at.

Set Required Inputs

So we don’t want to hard-code a bunch of values that’ll change at different GitHub Orgs, like in your own env. Let’s walk through which ones we need to set for it to run.

GITHUB_TOKEN - Set this to a GitHub Token that has the ability to read all Repo info, and write the GitHub App we’ll update
GITHUB_APP_INSTALLATION_ID - The numerical ID of the App you want to add Repos to. You can easily find this in the URL when you navigate to the App’s install page in your GitHub Org info
```
https://github.com/organizations/org_name/settings/installations/1234567890
```
GH_ORG - The name of your GitHub Org, all lower-cased
```
https://github.com/org_name
```
TOPIC - The single topic that you want to filter Repos for

If you’ll copy and paste all code, you should set all vars before running, like this:

GITHUB_TOKEN=ghp_abcdef
GITHUB_APP_INSTALLATION_ID=1234567890
GH_ORG=gh_org
TOPIC=name-of-topic

And if you’ll copy down the script to your machine and execute the script, make sure to export the variables:

export GITHUB_TOKEN=ghp_abcdef
export GITHUB_APP_INSTALLATION_ID=1234567890
export GH_ORG=gh_org
export TOPIC=name-of-topic

Validate Required Inputs

Let’s walk through the script. First, we need to establish this as a bash file, line 1.

Then on line 4 we have a series of checks to see if any of these variables are unset (-z). If they are, we exit 0 with an error. If you see these errors, make sure you’re setting and/or exporting the correct variables for this script to run.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	#!/bin/bash

	# Check for required variables to be set, and if not present, exit 0
	if [ -z "$GITHUB_TOKEN" ] \|\| [ -z "$GITHUB_APP_INSTALLATION_ID" ] \|\| [ -z "$GH_ORG" ] \|\| [ -z "$TOPIC" ]; then
	echo "One or more required variables not set, exiting"
	exit 0
	fi

view raw asdf.sh hosted with ❤ by GitHub

Declare Functions

First, we have a function that checks our API token wallet, and will hold until that wallet has more than 100 tokens in it. Tokens fill up with 5k tokens every hour, so unless you have a HUGE amount of repos to process, you won’t exhaust your token budget.

Show hidden characters

	hold_until_rate_limit_success() {

	# Loop forever
	while true; do

	# Any call to AWS returns rate limits in the response headers
	API_RATE_LIMIT_UNITS_REMAINING=$(curl -sv \
	-H "Accept: application/vnd.github+json" \
	-H "Authorization: Bearer $GITHUB_TOKEN" \
	-H "X-GitHub-Api-Version: 2022-11-28" \
	https://api.github.com/repos/$GH_ORG/$GH_REPO/autolinks 2>&1 1>/dev/null \
	\| grep -E '< x-ratelimit-remaining' \
	\| cut -d ' ' -f 3 \
	\| xargs \
	\| tr -d '\r')

	# If API rate-limiting is hit, sleep for 1 minute
	if [[ "$API_RATE_LIMIT_UNITS_REMAINING" < 100 ]]; then
	echo "ℹ️ We have less than 100 GitHub API rate-limit tokens left, sleeping for 1 minute"
	sleep 60

	# If API rate-limiting shows remaining units, break out of loop and exit function
	else
	echo ℹ️ Rate limit checked, we have "$API_RATE_LIMIT_UNITS_REMAINING" core tokens remaining so we are continuing
	break
	fi

	done
	}

view raw asdf.sh hosted with ❤ by GitHub

Next we have a HUGE function that gets all the repos across our Org (with support for pagination, which can get any number of repos, even if that number is thousands or tens of thousands), removes any archived repos (line 48-50), and stores that all the repos in a var (line 53), and counts them up for iterating purposes (line 56).

Line 35 is doing some amazing stuff, look at it closely - we’re getting a whole bunch of repos and and all their info - probably ~100 repos, and then we’re doing a select to find the array of Topics the repos have, and selecting only the json nodes that contain the Topic we passed to it, then we’re filtering those nodes down to just their top-level key called “name”. This is doing magic - in groups of 100-ish repos, we’re finding just the repos we want, and getting their names.

I <3 jq

(curl for repo page json package) | jq -r ".[] | select(.topics[] | contains(\"$TOPIC\")) | .name")

This doesn’t strictly need to be a function since we only call it once, but I find it helps keep our scripts clean to store it as one.

Show hidden characters

	get_org_repos() {

	###
	### Now that we have more than 1k repos, need to use paginated REST call to get all of them (search API hard limit of 1k)
	###

	# Grab Org info to get repo counts
	ORG_INFO=$(curl -sL \
	-H "Accept: application/vnd.github+json" \
	-H "Authorization: Bearer $GITHUB_TOKEN"\
	-H "X-GitHub-Api-Version: 2022-11-28" \
	https://api.github.com/orgs/$GH_ORG)

	# Filter org info to get repo counts
	PRIVATE_REPO_COUNT=$(echo $ORG_INFO \| jq -r '.owned_private_repos')
	PUBLIC_REPO_COUNT=$(echo $ORG_INFO \| jq -r '.public_repos')
	TOTAL_REPO_COUNT=$(($PRIVATE_REPO_COUNT + $PUBLIC_REPO_COUNT))

	# Calculate number of pages needed to get all repos
	REPOS_PER_PAGE=100
	PAGES_NEEDED=$(($TOTAL_REPO_COUNT / $REPOS_PER_PAGE))
	if [ $(($TOTAL_REPO_COUNT % $REPOS_PER_PAGE)) -gt 0 ]; then
	PAGES_NEEDED=$(($PAGES_NEEDED + 1))
	fi

	# Get all repos
	for PAGE_NUMBER in $(seq $PAGES_NEEDED); do
	echo "Getting repos page $PAGE_NUMBER of $PAGES_NEEDED"

	# Could replace this with graphql call (would likely be faster, more efficient), but this works for now
	PAGINATED_REPOS=$(curl -sL \
	-H "Accept: application/vnd.github+json" \
	-H "Authorization: Bearer $GITHUB_TOKEN"\
	-H "X-GitHub-Api-Version: 2022-11-28" \
	"https://api.github.com/orgs/$GH_ORG/repos?per_page=$REPOS_PER_PAGE&sort=pushed&page=$PAGE_NUMBER" \| jq -r ".[] \| select(.topics[] \| contains(\"$TOPIC\")) \| .name")

	# Combine all pages of repos into one variable
	# Extra return added since last item in list doesn't have newline (would otherwise combine two repos on one line)
	ALL_REPOS="${ALL_REPOS}"$'\n'"${PAGINATED_REPOS}"
	done

	# Find archived repos
	ARCHIVED_REPOS=$(gh repo list $GH_ORG -L 1000 --archived \| cut -d "/" -f 2 \| cut -f 1)
	ARCHIVED_REPOS_COUNT=$(echo "$ARCHIVED_REPOS" \| wc -l \| xargs)

	# Remove archived repos from ALL_REPOS
	echo "Skipping $ARCHIVED_REPOS_COUNT archived repos, they are read only"
	for repo in $ARCHIVED_REPOS; do
	ALL_REPOS=$(echo "$ALL_REPOS" \| grep -Ev "^$repo$")
	done

	# Remove any empty lines
	ALL_REPOS=$(echo "$ALL_REPOS" \| awk 'NF')

	# Get repo count
	ALL_REPOS_COUNT=$(echo "$ALL_REPOS" \| wc -l \| xargs)
	}

view raw asdf.sh hosted with ❤ by GitHub

Stage What We Need

Next we unsurprisingly call our functions to first check our rate-limit before getting started (line 4). This will hold if our token budget is exhausted until it’s refilled.

Then we get all our org repos, line 15, and store in a var.

Then on lines 18-22, we print a header that says how many repos we’re going to iterate through. I’ve found long-running scripts seem very suspicious if I’m not printing out lines that change for each repo, so either printing their name or number. We’ll talk about that shortly.

Show hidden characters

	###
	### Hold any actions until we confirm not rate-limited
	###
	hold_until_rate_limit_success


	###
	### Get Org-wide info
	###

	echo "########################################"
	echo Getting All Org Repos
	echo "########################################"

	get_org_repos

	# Add all repos in list to the GitHub App
	echo ""
	echo "########################################"
	echo "Iterating through $ALL_REPOS_COUNT repos"
	echo "########################################"
	echo ""

view raw asdf.sh hosted with ❤ by GitHub

When I run it, it looks like this:

ℹ️ Rate limit checked, we have 4022 core tokens remaining so we are continuing

########################################
Getting All Org Repos
########################################
Getting repos page 1 of 15
Getting repos page 2 of 15
Getting repos page 3 of 15
Getting repos page 4 of 15
Getting repos page 5 of 15
Getting repos page 6 of 15
Getting repos page 7 of 15
Getting repos page 8 of 15
Getting repos page 9 of 15
Getting repos page 10 of 15
Getting repos page 11 of 15
Getting repos page 12 of 15
Getting repos page 13 of 15
Getting repos page 14 of 15
Getting repos page 15 of 15
Skipping 20 archived repos, they are read only

########################################
Iterating through 286 repos
########################################

One last important piece - set a counter var. We’ll start at 0, and increment at the beginning of our loop.

Show hidden characters

	# Initialize counter var to keep track of repo processing
	CURRENT_REPO_COUNT=0

view raw asdf.sh hosted with ❤ by GitHub

Let’s Add Some Repos aka The Big Loooooooop

We’re going to start a giant (well, a few dozen lines) loop, so let’s first loop at the loop construct.

We’ll do a while loop that terminate when it reaches the end of what it’s iterating over. And what it’s iterating over is on line 4, the $ALL_REPOS var.

On line 2 you can see we set IFS=$’\n’, which means that for each new line character, treat that as a single thing to iterate over. Bash natively stores lists with a newline character, so that’s perfect for iterating.

Also on line 4, we `read -r GH_REPO` which means read the input variable as the new variable `GH_REPO`, which will be set to the name of the repo on each iteration.

Show hidden characters

	# Iterate over all repos
	while IFS=$'\n' read -r GH_REPO; do
	# smart stuff goes in here
	done <<< "$ALL_REPOS"

view raw looooop.sh hosted with ❤ by GitHub

Okay, now we’re going to go into the loop!

On line 5, we immediately check our token budget. If we’re good, continue, if not, loop until we are good again.

On line 8, we increment our counter - we started at 0, so on first iteration it’ll show the CURRENT_REPO_COUNT=1, and it’ll increment higher numbers on each loop.

On line 11 we do a curl to GitHub’s REST endpoint to get a single repo’s information - the repo we’re working on. What we need here is the “id” attribute of the repo - it’s used on the next REST call, where we add the individual Repo to the GitHub app.

Note we get the entire repo package info, and then we use `jq` to filter the response down to just `.id`, which means the top-level json key called “id”’s value.

Show hidden characters

	# Echo out some spaces
	echo "###################"

	# Hold until rate limit is not hit
	hold_until_rate_limit_success

	# Increment counter
	CURRENT_REPO_COUNT=$((CURRENT_REPO_COUNT + 1))

	# Find github repo ID with REST call
	GH_REPO_ID=$(curl -sL \
	-H "Accept: application/vnd.github+json" \
	-H "Authorization: Bearer $GITHUB_TOKEN" \
	-H "X-GitHub-Api-Version: 2022-11-28" \
	https://api.github.com/repos/$GH_ORG/$GH_REPO \| jq -r '.id')

view raw asdf.sh hosted with ❤ by GitHub

Next up, we actually attempt to add the repo to the github action. On line 2, we do a very similar curl to the GitHub REST endpoint for the github app to PUT a repo ID in it, which REST interprets as “add a repo to this github app”. Note the “2>&1” at the end, which means to store any error messages (stderr goes to the 2 output) in the variable.

That’s important because we check on line 10 if there is a response error, such as “Not Found”, which indicates the add didn’t work. If that happens, we catch the error and print the full error output for a human to investigate. Probably the GitHub App ID is wrong, or the permissions aren’t enough to add the Repo to the GitHub App.

If it works, we print a happy message.

Show hidden characters

	unset CURL
	CURL=$(curl -L \
	-X PUT \
	-H "Accept: application/vnd.github+json" \
	-H "Authorization: Bearer $GITHUB_TOKEN" \
	-H "X-GitHub-Api-Version: 2022-11-28" \
	"https://api.github.com/user/installations/$GITHUB_APP_INSTALLATION_ID/repositories/$GH_REPO_ID" 2>&1 \
	)
	# Check for errors
	if [[ $(echo "$CURL" \| grep -E 'Not Found') ]]; then
	echo "☠️ Something bad happened adding $GH_REPO to Gitub App, please investigate response:"
	echo "$CURL"
	else
	echo "💥 Successfully added $GH_REPO ($CURRENT_REPO_COUNT/$ALL_REPOS_COUNT) to GitHub App w/ ID $GITHUB_APP_INSTALLATION_ID"
	fi

	echo ""

view raw asdf.sh hosted with ❤ by GitHub

When I run the loop it looks like this:

########################################
Iterating through 285 repos
########################################

###################
ℹ️ Rate limit checked, we have 4938 core tokens remaining so we are continuing
💥 Successfully added Repo1 (1/285) to GitHub App w/ ID 1234567890

###################
ℹ️ Rate limit checked, we have 4935 core tokens remaining so we are continuing
💥 Successfully added Repo2 (2/285) to GitHub App w/ ID 1234567890

###################
ℹ️ Rate limit checked, we have 4932 core tokens remaining so we are continuing
💥 Successfully added Repo3 (3/285) to GitHub App w/ ID 1234567890

And at the very end, we profit.

Show hidden characters

	echo "###################"
	echo "Run complete!"
	echo "###################"

view raw asdf.sh hosted with ❤ by GitHub

Summary

In this write-up we talked about the limitations of the GitHub App UI, and how this is possible to manage manually, but it’d take several hundred clicks and potentially hours to do. Plus it’d be pretty manual error prone.

Then we walked through a script that does exactly what we need to do - find all the repos, then filter for just those with the tag we want, and then store their names in a variable. We loop over that list of repos, find each repo’s ID, and then PUT that repo into the list of repos the GitHub App is targeted at.

All the code we talked about can be found here, ready to run!

Good luck out there!
kyler

Let's Do DevOps

Discussion about this post