🔥Let’s Do DevOps: Passing data between GitHub Actions jobs, steps, and tasks (and make Matrices…

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…

May 16, 2023

This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!

Hey all!

I recently published a series of articles focused on how to build a GitHub App that triggers on Org actions, like creating a new repo, how to use an AWS Lambda to access secrets and trigger GitHub Actions, and how to pass an input variable to a GitHub Action. But I realize that a lot of those very cool widgets don’t work without something that’s quite basic — passing information around inside a GitHub Action.

When a variable is set by a task (the smallest unit of work inside a GitHub Action), that variable is accessible within the same task. But the next task? Nope. It’s totally gone. So passing that var to the next step, or another job that’s potentially run on an entirely different builder? Definitely no (at least by default).

However, as you build GitHub Actions, you need to do all of this (and more). Let’s talk what GitHub recommends, and the hacky, awesome ways I personally use that works best for my Actions.

GitHub Actions: Sections

In the intro I used a lot of jargon that might not be familiar to you — Action, Job, Step, Task. Or maybe even Variable. If you know all those terms, skip to the next section! If you want to make sure you understand them, read on.

A GitHub Action is a workflow definition. It’s a yaml-encoded file that lives within the same Repo where the work is done. The logic in these files are very flexible (if a little unintuitive, especially around passing information). The Actions define all the logic that Action will use when it runs. It looks like this. You can see it defines a name for the job (line 1), a list of triggers when the job should run (line 2), and a list of jobs (line 3). Everything else is part of Job or Step syntax, that’ll talk about next.

A GitHub Action defines one or more Jobs. These Jobs are a list of Tasks to run, and can be configured to run concurrently, or in series. These Jobs can be assigned to different builders or the same builder. They define a series of Tasks that are called Steps.

Steps are individual tasks that should be executed by the Job. My example picture shows each Job with 1 Step, but usually a Job has many Steps. Note how these are run blocks that execute a command line command (in this case on an ubuntu-latest system, so running using bash.

Not shown — a Step can be to instead call another Action and run the steps there. There are lots of pre-built Actions from creators and tech companies that do complex stuff with minimal inputs — you can find them all in the GitHub Actions Marketplace. Keep in mind that not all these Actions are trustworthy — if a task is simple, you may want to build it yourself. You can also filter for those Actions from Verified Creators, which doesn’t guarantee they’re trustworthy, but means they’re from large tech companies with code testing and bug programs to address discovered security issues.

And the very basic for programming — a variable is a named box that can store a value. If you say FOO=BAR that means a variable named FOO is declared, when you say echo "$FOO is the best" it’ll print BAR is the best because the value of FOO is used in place of where $FOO is in an output.

Passing a Variable Between: Jobs

Let’s start with an example that looks like it should obviously, obviously work. In the following example we define an Action with a single job, called pass-var, on line 4. Within it there are 2 steps — one which we set a variable (line 7), and one which we read it (line 13). These steps are ALWAYS executed on the same computer, because a Job is assigned to a single host, and then the Steps are run in order.

If you were to run the commands in order on your linux/mac computer, they would work fine. On line 10 we’re setting a variable named ENVIRONMENT, and then on line 11 we’re printing it to make sure it’s set correctly.

Then the next step starts, and we do a simple if statement, where we check if the variable is blank (line 16, the -z is short-hand logic for check if this variable is blank). If it is, we exit.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	name: Passing variable Fail
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"

	- name: Read deploy location
	id: read-var
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw passing_var_between_steps_broken.yaml hosted with ❤ by GitHub

Let’s run the Action in the terminal and see how it goes. As you can see below, the Set deploy location step worked perfectly — the variable value dev is printed to the terminal.

However, the Read deploy location step failed — it says the variable is blank! Why did that happen?

The answer is unintuitive compared to some programming platforms. Each Step is intended to be a separate box in terms of environment values. Primarily this is because you can inject “secret” or “secure” environment values into a single Step, and GitHub assumes you don’t want those secrets to leak to other Steps. This is a good architecture, but makes for an extra step to do what would normally work on your computer.

Let’s fix our Action and talk about why it works. The new logic is on line 12 — We are echo (printing) the map ENVIRONMENT=dev to a special GitHub “environment file”. Note the >> which means append (> means overwrite).

Show hidden characters

	name: Passing variable Fixed
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"
	echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV

	- name: Read deploy location
	id: read-var
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw passing_var_between_steps_fixed.yaml hosted with ❤ by GitHub

Now when we run it, we see the same in the first step, and the second step (and any subsequent step in the same Job) can read that environment.

Passing a Variable Between: Jobs

Okay, so that’s passing a variable between Steps — those are executed on the same computer, that’s easy peasy. What about between Jobs — those are potentially executed on two different computers!

Let’s add an entirely new job and see if it works the same as steps (hint: No). You can see we’re doing the same thing in our first job pass-var, starting on line 4. We only have 1 job now — it’s setting our ENVIRONMENT variable to dev, just like before, and writing it to $GITHUB_ENV just like before — that worked great to pass info to downstream Steps.

And now we have an entirely new Job, creatively called job2-read-var (hey, I’m en engineer, not a fiction writer, eh?). This Job has a single task too, and all it does it check to see if the variable ENVIRONMENT is populated.

Show hidden characters

	name: Passing variable Jobs Broken
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"
	echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV

	job2-read-var:
	runs-on: ubuntu-latest
	steps:
	- name: Print the deploy location
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw pass_var_between_jobs_v1.yml hosted with ❤ by GitHub

Let’s run it and see how it goes. Well, it failed, but that’s not the interesting part! These jobs aren’t configured to depend on one another, so they run concurrently on different builders. There’s no instructions for the second job to wait for the first job to set a variable, even if it was configured correctly. So let’s add some logic so they run in serial — the first job to set the var, and then the second job to read it.

The new logic is on line 16 — needs. Needs tells a job what to wait on before it starts. You can pass a list with square brackets, like this, if you need to wait for several jobs to finish: [pass-var, set-var, foo-bar]. However, we have just a single job to wait on, so we just pass a string — the id of the job. Note that this isn’t the name of the job, it’s an id that’s relevant within the Action context itself — what the first job sets on line 4, to uniquely identify the job.

Show hidden characters

	name: Passing variable Jobs v2 (Broken)
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"
	echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV

	job2-read-var:
	runs-on: ubuntu-latest
	needs: pass-var
	steps:
	- name: Print the deploy location
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw pass_var_between_jobs_v2.yml hosted with ❤ by GitHub

Let’s run it again now that it’s waiting and see what happens. Check this out! It still failed, but it ran in sequence, that’s awesome! We need that to happen as a precursor to passing information between the Jobs — otherwise job2 starts right away before job1 can do anything.

To fix this, we need to implement an “output” from the first job and then we can reference it in the second job. There’s a surprising amount of changes we need to make in order for this to work, let’s go over them.

First, on line 14, we need to send the output to an entirely new place — the $GITHUB_OUTPUT file, rather than the $GITHUB_ENV file, like this:

echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_OUTPUT

However, that means that the variable wouldn’t be available for subsequent Steps in the first Job, which is annoying. Bash itself offers a fix, the command tee that can write piped output to multiple places. We tee -a (which means write this output in “append” (-a) mode) to several places. I’m a huge fan of this, because it keeps you from writing duplicate output lines.

Next, we need to create an output from the first job, using the syntax on line 6–7. This means to make this output available to other jobs within the same Actions run.

Then we need to update the job2-read-var Step with a special config to be able to access the output using GitHub’s syntax ${{ (stuff) }}. You can see an example on line 20-21. This sets an environmental variable within only that Step. If you wanted the value to be available for subsequent Steps in the same Job, you’d output that value to the $GITHUB_ENV just like our first example.

Show hidden characters

	name: Passing variable Jobs v3 (Fixed)
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	outputs:
	ENVIRONMENT: ${{ steps.set-var.outputs.ENVIRONMENT }}
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"
	echo "ENVIRONMENT=$ENVIRONMENT" \| tee -a $GITHUB_ENV $GITHUB_OUTPUT
	job2-read-var:
	runs-on: ubuntu-latest
	needs: pass-var
	steps:
	- name: Print the deploy location
	env:
	ENVIRONMENT: ${{ needs.pass-var.outputs.ENVIRONMENT }}
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw pass_var_between_jobs_v3.yml hosted with ❤ by GitHub

Passing a (Variable) File Between: Jobs

You’re also able to pass a binary file between Jobs. This works by default between Steps because they operate on the same file system, so we’ll skip that use case. However, Jobs are on different computers, so passing a file between them is super useful.

You can pass any binary file, but you can also pass a file that stored the environment values map — this can be super useful when the environment values are dynamic somehow — maybe they are populated with different names, or with different prefixes or suffixes. Or just because I find this method quite a bit cleaner than using the complex Step Output → Job Output → Task Input model we just talked about.

Let’s update our Action file and then walk through what we updated. New (working) file follows. First, on line 12, note how we’re writing our ENVIRONMENT variable to both $GITHUB_ENV (subsequent Steps in the same job) and also to a file called env.vars. The name of this file is totally arbitrary, and it’s a literal file on your builder’s disk. You can write n variables to this file — as long as you’re appending (tee -a), go nuts.

Then we do something neat — we call an action called actions/upload-artifact@3 from the GitHub marketplace. This is one from GitHub itself (they publish under the name actions), and it uploads a file as an artifact. Artifacts have a special meaning in CI/CDs — they are compiled or otherwise binary files that are built by automation and accessible to downstream automation. We tell it to keep this file for 365 days (line 19) and store only a single file — env.vars. Note line 18 — the name of this environment cache is the github_run_id, which is a unique number from the GitHub context to identify this run of the Action. Any downstream Jobs will have the same github.run_id, which is valuable for Action Run-specific files, like variables and binaries.

Show hidden characters

	name: Passing File Jobs v1
	on: workflow_dispatch
	jobs:
	pass-var:
	runs-on: ubuntu-latest
	steps:
	- name: Set deploy location
	id: set-var
	run: \|
	ENVIRONMENT=dev
	echo "The ENVIRONMENT var is: $ENVIRONMENT"
	echo "ENVIRONMENT=$ENVIRONMENT" \| tee -a $GITHUB_ENV env.vars

	- name: Cache Envs
	id: cache-envs
	uses: actions/upload-artifact@v3
	with:
	name: env-cache-${{ github.run_id }}
	retention-days: 365
	path: env.vars

view raw passing_file_between_jobs_job1.yaml hosted with ❤ by GitHub

Job 1 ends, and Job 2 starts — the first thing it does is download the environment file (line 5). We pass it the same name as was used to store the file. Then on line 11 we read the file, which means to cat the file (read) and append that information to the $GITHUB_ENV special file. That means those environment variables are available to any subsequent Step in this Job.

Show hidden characters

	job2-read-var:
	runs-on: ubuntu-latest
	needs: pass-var
	steps:
	- name: Download Env Vars
	id: download-env-vars
	uses: actions/download-artifact@v3
	with:
	name: env-cache-${{ github.run_id }}

	- name: Read Env Vars
	id: read-env-vars
	run: \|
	cat env.vars >> $GITHUB_ENV

	- name: Print the deploy location
	run: \|
	if [ -z "$ENVIRONMENT" ]; then
	echo "No ENVIRONMENT var set, exiting"
	exit 1
	else
	echo "The ENVIRONMENT var is set to: $ENVIRONMENT"
	fi

view raw pass_var_between_jobs_v3_job2.yml hosted with ❤ by GitHub

And check that out, all happy! And note something else cool with this model — the artifact is stored on the Action run page in the GitHub UI, and it’s available for downloading. This can be really useful for traceability with large, complex actions. Just make sure that you’re not writing sensitive variables here — they’re all readable to most folks inside the GitHub UI.

Set Job 2’s Environment Inside Job 1

Something cool that I’ve just started to play with and learn is to have a job itself set the run context of other jobs. For instance, the Action file itself sets some things, like the environment utilized to run job2 — batch-deploy.

However, we can use the same output syntax that we used earlier to set that value. That means the context of the second job isn’t calculated until job 1 runs, and can change based on all sorts of stuff you read from inside your Actions. That’s hugely powerful, and this very simple example doesn’t really show it off.

Show hidden characters

	name: Test passing environment
	on: workflow_dispatch
	jobs:
	set-environment:
	runs-on: ubuntu-latest
	outputs:
	ENVIRONMENT: ${{ steps.set-environment-task.outputs.ENVIRONMENT }}
	steps:
	- name: Define deploy locations
	id: set-environment-task
	run: \|
	env="dev"
	echo "ENVIRONMENT=$env" >> $GITHUB_OUTPUT
	batch-deploy:
	needs: set-environment
	runs-on: ubuntu-latest
	environment: ${{ needs.set-environment.outputs.ENVIRONMENT }}
	steps:
	- name: Deploy to environment
	run: echo "hi mom"

view raw github_action_set_environment_job2.yaml hosted with ❤ by GitHub

Set Job2’s Matrix from Within Job1

Let’s show off some really cool stuff — GitHub Actions support a concept called a Matrix, that lets several jobs run concurrently — it’s really useful for software building — say you want to build different versions of Chrome on different Operating Systems — matrix’s are perfect for that. I’ve used them primarily for Terraform validation across different workspaces.

But what if that Matrix of jobs was dynamic, and could be configured by your own scripts in Job1 — well, it can.

This is clearly a static example — we’re just setting variables (note the single quotes within the double quotes) to static values, but you could compute them however you want!

Show hidden characters

	name: Test passing matrix
	on:
	- workflow_dispatch
	jobs:
	set-environment:
	runs-on: ubuntu-latest
	outputs:
	ENVIRONMENT: ${{ steps.set-environment-task.outputs.ENVIRONMENT }}
	steps:
	- uses: actions/checkout@v3

	- name: Define deploy locations
	id: set-environment-task
	run: \|
	env1="'env1'"
	env2="'env2'"
	env3="'env3'"
	echo "ENVIRONMENT=[$env1, $env2, $env3]" >> $GITHUB_OUTPUT

	batch-deploy:
	needs: set-environment
	runs-on: ubuntu-latest
	strategy:
	matrix:
	environment: ${{ fromJSON(needs.set-environment.outputs.ENVIRONMENT) }}
	steps:
	- name: Deploy to environment
	run: echo "hi mom"

view raw pass_matrix_between_jobs_v1.yaml hosted with ❤ by GitHub

Here’s the matrix build. The second Job is executed 3 times, once for each “environment” value we passed to it from job 1.

The Most Dynamic — On Demand Matrix

But remember, this builder has access to your whole Repo’s filesystem, so it can read the files in your directories. Say you have some number of data files — you can have 1 or 100, and you want each one to be built one time, and you want it to happen AS FAST AS POSSIBLE.

Well, you can construct an arbitrary matrix of jobs based on that list of files, see line 15,18, and then write it as an output.

Then the next job will read that matrix, and execute that number of data files concurrently. Note that line 32 runs a command, build.sh and then injects a value from the matrix context — for that run of the Matrix, it’s set to the single environment you want to build.

Show hidden characters

	name: Build and pass dynamic matrix
	on: workflow_dispatch
	jobs:
	set-environment:
	runs-on: ubuntu-latest
	outputs:
	CONFIG_LIST: ${{ steps.set-environment-task.outputs.CONFIG_LIST }}
	steps:
	- uses: actions/checkout@v3

	- name: Define deploy locations
	id: set-environment-task
	run: \|
	# Read pods from a directory
	ALL_DATA_FILES=$(ls -l data/ \| grep -Ev 'total' \| rev \| cut -d " " -f1 \| rev \| cut -d "." -f 1)

	# Format environments list
	CONFIG_LIST=$(echo "$ALL_DATA_FILES" \| awk 'NF' \| sed "s/^/'/g" \| sed "s/$/'/g" \| tr '\n' ', ' \| sed 's/,$//' \| sed 's/^/[/g' \| sed 's/$/]/g' \| sed 's/,/, /g')

	# Write output
	echo "CONFIG_LIST=$CONFIG_LIST" >> $GITHUB_OUTPUT

	batch-deploy:
	needs: set-environment
	runs-on: ubuntu-latest
	strategy:
	matrix:
	environment: ${{ fromJSON(needs.set-environment.outputs.CONFIG_LIST) }}
	steps:
	- name: Build it
	run: \|
	build.sh ${{ matrix.environment }}

view raw pass_matrix_between_jobs_v2.yaml hosted with ❤ by GitHub

In this way you can have a massively dynamic Actions infrastructure. 🔥

Summary

In this write-up we went over how GitHub sandboxes Steps by default, and doesn’t allow environment values to pass between them. Then we talked about how to pass information anyway between:

Steps in a Job
Jobs in an Action — using both outputs and files

We also talked about how we can influence how Actions themselves run, for instance by passing:

A string value from Job 1 to Job 2 to set an environment
A static matrix to Job 2
A massively dynamic matrix to Job 2

I hope this makes your Actions work better, and maybe even make them more dynamic! Examples of cool stuff you built is welcome :)

Good luck out there!
kyler

Let's Do DevOps

Discussion about this post