🔥Writing Dozens of Tools to Migrate an Enterprise from BitBucket to GitHub

Feb 20, 2023

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all!

These blog normally zooms in on particular technologies or use cases, but today we’re going to zoom out. Way, way out. I was recently (well, 6 months ago), to migrate an enterprise from an internal Stash/BitBucket server to a GitHub Organization. Full stop, good luck!

That project is nearly complete. As part of gathering information, preparing the new GitHub tenant, and executing the migration, I’ve had the opportunity to write dozens of bespoke tools. These tools are intended to gather information, build reference files that downstream tools will use, or to directly update settings or copy code and other repos. They create PRs in GitHub, they update Jenkins pipelines, they read and set settings in Jenkins, GitHub, and BitBucket.

As a collection, they are what enables this very large project to move forward. Let’s talk about some of the tools I remember writing (there are surely more I don’t!) and what they do! 🚀

Tooling Note: I Build My Own

Note for folks reading — there are ocassionaly tools available on these platforms or in public that could gather this information for us. However, I am reticent to use external tooling — I don’t know perfectly how it works, I don’t trust it fully, and it’s often not as customizable as writing my own tools.

Therefore, I write my own tools whenever possible.

Who Are The Active Users?

The first question of most migrations is licensing. It takes a long time to purchase things, and can be expensive, so we want accurate counts for licenses. So, a simple question — Which users (and how many) are active in our projects on our internal BitBucket server?

We don’t want to buy too many licenses, and we are only first moving one division to GitHub. So we need to see how many users are “active”. That “active” is hard to define. Have they opened a PR? Commented on a PR? Reviewed a PR? Then they’re “active”. BitBucket doesn’t have an easy report for that, but I did find it displays that information in its APIs for each PR.

So the solution — let’s read a list of BitBucket Projects that are part of our migration, then read each repo, then in each repo, read each PR’s attributes, which include those active metrics. Then

The tool is here: https://gist.github.com/KyMidd/9a7481ef1be2f7d639b36b6d785e16b0

The meat of it is in the loop here. We read each project, then read each repo, then query the author and reviewer (which includes all votes on any PR) for the last 1k PRs. We output all users to a file. It will of course include (mostly) duplicates, but the sort and uniq tool give us a workable list of folks. At least close enough for a human to read and count for licensing purposes.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

	for PROJECT in $(echo $EHR_RELATED_STASH_PROJECTS); do
	echo "💥 Working on project $PROJECT"
	# Find slug of all repos in a project
	unset PROJECT_REPOS
	PROJECT_REPOS=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects/$PROJECT/repos\?limit\=$PR_LIMIT \| jq -r '.values[].slug')
	# Iterate over each repo to find all PRs, read limit from var
	for REPO in $(echo $PROJECT_REPOS); do
	echo "Working on repo $REPO"
	unset AUTHOR_USER_NAMES
	unset REVIEWER_USER_NAMES
	AUTHOR_USER_NAMES=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects/$PROJECT/repos/$REPO/pull-requests\?state\=ALL\&limit\=$USER_LIMIT \| jq -r '.values[].author.user.name' \| sort \| uniq)
	REVIEWER_USER_NAMES=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects/$PROJECT/repos/$REPO/pull-requests\?state\=ALL\&limit\=$USER_LIMIT \| jq -r '.values[].reviewers[].user.name' \| sort \| uniq)
	echo $AUTHOR_USER_NAMES \| tr " " "\n" >> users
	echo $REVIEWER_USER_NAMES \| tr " " "\n" >> users
	done
	done
	# Sort, uniq
	cat users \| sort \| uniq > users_sorted

	# Find all project keys
	STASH_PROJECT_KEYS=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects\?limit\=1000 $\| jq -r ".values[].key")

	# Iterate over list of repos, generate repo clone urls
	### Loop over every repo in environment
	while IFS=$'\n' read -r PROJECT_KEY; do

	# Set counter var
	OPEN_PR_COUNT=0

	echo "X Searching project: $PROJECT_KEY"

	# Find all repos in project
	ALL_REPOS_IN_PROJECT=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects/$PROJECT_KEY/repos\?limit\=1000 $\| jq -r '.values[].slug')

	while IFS=$'\n' read -r REPO_NAME; do

	#echo "- Searching repo: $REPO_NAME"

	# Find count of open PRs in repo
	REPO_OPEN_PR_COUNT=$(curl -s --user $STASH_USER:$STASH_PASS https://$STASH_URL/rest/api/1.0/projects/$PROJECT_KEY/repos/$REPO_NAME/pull-requests $\| jq -r .size)

	echo "- Repo $REPO_NAME has $REPO_OPEN_PR_COUNT open PRs"

	# Increment counter var
	((OPEN_PR_COUNT=OPEN_PR_COUNT+REPO_OPEN_PR_COUNT))

	done <<< "$ALL_REPOS_IN_PROJECT"

	echo "$PROJECT_KEY has $OPEN_PR_COUNT open PRs"
	echo "$PROJECT_KEY,$OPEN_PR_COUNT" >> project_open_prs.csv

	done <<< "$STASH_PROJECT_KEYS"

	# Grab all repo clone values
	getRepoCloneValues=$(cat project_repos_raw \| jq --raw-output --arg repo "$repo" '
	.values[]
	\| select( .name \| contains($repo)).links.clone[]
	\| select( .name \| contains("ssh")) \| .href')

	# Select first repo clone value if there are multiple
	IFS=';' read -r stashRepoUrl string <<< "$getRepoCloneValues"

	# Clone old repo
	git clone --mirror $stashRepoUrl $repo

	# Change context into folder
	cd $repo

	# Attempt push trap results
	echo "Attempting git push --mirror without any modifications"
	gitPushResults=
	gitPushResults=$(git push --mirror https://oauth2:$GITHUB_TOKEN@github.com/$github_project_name/$github_repo_name.git --porcelain 2>&1)

	for PROJECT_CODE in PROJ1 PROJ2 PROJ3; do
	REPOS=$(curl -s --user $STASH_USER:$STASH_PASS "https://$STASH_URL/rest/api/1.0/projects/$PROJECT_CODE/repos?limit=1000" \| jq -r '.values[].name')
	while IFS=$'\n' read -r REPO_NAME; do
	echo $PROJECT_CODE,$REPO_NAME
	done <<< "$REPOS"
	done

Let's Do DevOps

🔥Writing Dozens of Tools to Migrate an Enterprise from BitBucket to GitHub

Tooling Note: I Build My Own

Who Are The Active Users?

Which Collections Should Go First?

Copy The Git Repo to GitHub

Create the Exporter File

Create Hundreds of GitHub Teams

“What’s Your GitHub User Name?” Automation

Populate All The Team Members!

GitHub Default Branch and Repo Name Casing Fixes

Stage Actions and CODEOWNERS files in Thousands of Repos

Update our Jenkins to Point at the New Code Source Location

Assign Repo Permissions

More

Summary

Subscribe to Let's Do DevOps

Discussion about this post