🔥Let’s Do DevOps: Update Files in Hundreds of GitHub Repos
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all!
I’m migrating my current business from an internal BitBucket server to public GitHub. The migration has mostly been very exciting — GitHub is cutting edge, and the security and scalability features that GitHub makes available are a huge enabler for our velocity.
That’s not to say things are perfect — there are some functionality of BitBucket that is managed as metadata (like default reviewers on PRs), and can therefore be easily updated at scale, that GitHub manages with files that are committed to the repo directly.
The big ones are:
CODEOWNERS file assigning different paths to teams who are auto-assigned as reviewers
Actions files to run specific actions in repos
The short of it is — sometimes you need to make changes to A LOT of repos at a time. Like, say you need to make a change in every single repo you have in github, and you have 300. Well, that’s going to take a super long time.
So I wrote a tool for that. This tool iterates over every repo in an Org (or User space) and makes changes, then adds those changes to a git branch, pushes the branch, opens a PR, and optionally force-closes the PR with admin permissions. ✨It’s magic! ✨
Let’s look at it. ❤ And if you only care about downloading the code to run it, scroll to the end for a link to the repo!
GitHub Authentication
We’ll need to do two different authentication methods to GitHub. First, we need a control-channel authentication to download the names of all the GitHub repos in your Org, as well as some metadata like the default branch in each repo.
For GitHub, this is usually a PAT, or Personal Access Token. Instructions for how to create one are at the link here ^^, and once you have it export it into your terminal as variable GITHUB_TOKEN
. The script will check to make sure that var is populated, and if not, it’ll exit with an error message.
# Auth Requirements | |
# Make sure to export your github token. If SSO is enabled in your Org, you will need to authorize your token for SSO within the Org | |
# export GITHUB_TOKEN='ghp_xxxx' | |
# check to make sure GITHUB_TOKEN is set | |
if [ -z "$GITHUB_TOKEN" ]; then | |
echo '$GITHUB_TOKEN is not set, please set it and try again' | |
exit 0 | |
fi |
We don’t have a check for this, but this script also requires an SSH key in order to git push
branches and commits to GitHub. If you’ve done any pushes to GitHub before, you likely already have this in place.
Set Some Variables
There are some variables that we need to set — first of all, your GitHub Org name, on line 2.
Also, the commit and PR information on line 5–8. This can be customized to whatever you’d like. Remember the branch names shouldn’t have spaces, while the other fields can have spaces.
# Set vars | |
gh_org=your-org-name-here # Your GitHub Organization (or your username, if that's where your repos are) | |
# PR information - please customize this information | |
pr_body="This PR makes some automated changes to the repo." | |
pr_title="🤖 Making some changes 🤖" | |
branch_name='Branch-Name-Here' | |
commit_message='Commit message here' |
If you are an admin on the repos you’re targeting, you’re able to automatically merge the PRs that we’re going to create forcefully. If you want the script to attempt to merge your PRs with admin rights, set this var to true
. If it’s set to false
, the PRs won’t be automatically merged, but you’re able to go to each of them and merge them from the web UI yourself :)
# Should we use admin privileges to merge PR. | |
# If true, admin privileges will be used to merge the PR. You must have admin privileges to use this option. | |
# If false, the PR will not be automatically merged. The URL will be written to the log, and you must merge them manually | |
auto_merge_pr=false |
Target Some Repos
We need a list of repos in your Org, so we use the gh
CLI tool to grab them. This requires the gh
tool is installed (get it here) and that you have no more than 1k repos. If you have more than 1k repos you’ll need to use a paginated API and collate all the pages together (instructions here).
Note that we don’t grab any repos that are archived
. Archived repos are locked, and we’re unable to update them, so no reason to attempt to update since that update will fail.
# Get the names of all repos in the org | |
# This method is limited to 1k repos, if you have more than 1k repos, use this method: https://medium.com/@kymidd/lets-do-devops-github-api-paginated-calls-more-than-1k-repos-3ff0cc92cc50 | |
org_repos=$(gh repo list --no-archived $gh_org -L 1000 --json name --jq '.[].name') |
Loop Over Every Repo
We start a while loop over the list of repos, which will process each repo, 1 by 1.
Since we need to make changes within the repo’s files, the first step is to download the repo, so we clone it on line 5.
Then we change our directory into the repo on line 8.
# Iterate over all repos, make changes | |
while IFS=$'\n' read -r gh_repo; do | |
# Clone the repo, will fail if the repo folder already exists | |
git clone git@github.com:$gh_org/${gh_repo}.git | |
# Change directories into the repo | |
cd $gh_repo |
And then I have a code-block that does nothing — this is where you should put your commands that you want to run within each repo! For instance, do you want to add a file? Delete a file? Update a file? Put those commands here! Remember the commands are executing from the root of the repo’s directory structure.
### | |
### Make your changes here | |
### Add or delete any files you need to this location | |
### For example, modify any file, or copy over existing files | |
### |
For the PR we’re going to create, we need to target a specific branch, or the PR creation will fail. The most extensible solution I’ve found to this is to target the “default branch” of each repo.
So we use curl
to execute a REST call against the github repo’s information, and filter for the default branch using jq
, and save it as the var base_branch
, which we’ll use shortly.
# Read the REST info on the repo to get the repo's default branch | |
# Set that default branch as the base branch for the PR | |
base_branch=$(curl -s \ | |
-H "Accept: application/vnd.github+json" \ | |
-H "Authorization: Bearer $GITHUB_TOKEN" \ | |
https://api.github.com/repos/$gh_org/$gh_repo | jq -r '.default_branch') |
We attempt a git add
against any changes you’ve made in the repo. The verbosity addition means that git would print a little output for each file that was added or removed. Which means that if you haven’t made any changes in the repo, we can not create a PR (since there are no changes to add).
# Git add with '.' target identifies all changes | |
# Using the '-vvv' verbose flag to get the output of the git add command, which we will use to determine if there are changes | |
git_add=$(git add -vvv .) |
Okay, this next code block is a lot, but it’s all tightly coupled, so let’s talk about it as a whole. On line 3, we’re looking at the results of git add
to see if any files were added or removed. If yes, let’s create a PR! If not, we skip the whole business and move on to the next repo.
On line 6, we create a new branch and check it out. On line 7, we create a new commit to package up our changes (staged by git add
earlier). On line 8, we push those changes to the origin, and on line 9 we create the PR!
On line 12, we check to see whether we’ve told our script to auto-merge the PR. If yes, we attempt to do so (again using the gh
CLI tool), and if not, we print out the URL of the PR we created so you can easily click through to it.
On line 19, we sleep for 2 seconds to avoid hitting GitHub’s rate-limits.
# If there are no changes, the PR will not be created | |
# Note that even modified files will show up as 'add' in the git add output | |
if [[ $(echo "$git_add" | grep -E 'add|remove') ]]; then | |
# Changes were made, checkout a branch and make a PR | |
git checkout -b "$branch_name" | |
git commit -m "$commit_message" | |
git push origin "$branch_name" | |
created_pr_url=$(gh pr create -b "$pr_body" -t "$pr_title" -B "$base_branch" --fill) | |
# If auto_merge_pr is true, merge the PR | |
if [ "$auto_merge_pr" = true ]; then | |
gh pr merge --admin -d -m "$created_pr_url" | |
else | |
echo "PR created, please merge: $created_pr_url" | |
fi | |
# Sleep 2 seconds to avoid rate limiting | |
sleep 2 | |
fi |
Next we finish off the script. We reset our location to the holder directory (line 2), and close out the while loop (line 4). Note that we’re passing the org_repos
var to the while loop with quotes, so it’s read line by line.
And then we close it all out with a nice message (line 7–9), and exit 0
.
# Reset location | |
cd .. | |
done <<< "$org_repos" | |
# Finish | |
echo "################" | |
echo "Done!" | |
echo "################" | |
exit 0 |
Summary
Through this blog we talked through why we’d need to make changes to a bulk of GitHub repos, and then walked through the implementation steps for a bash-based tool which clones all repos in a GitHub Org and makes changes to all of them, creates a PR, and optionally force-merges the PR into the default branch in each repo.
You can find all the code here:
GitHub - KyMidd/OrgWideGitFileChanger: A bash-based tool to read over all GitHub Org's repos, clone…
A bash-based tool to read over all GitHub Org's repos, clone them all, and make changes to each one in sequence. Linked…github.com
Good luck out there!
kyler