🔥Let’s Do DevOps: Passing data between GitHub Actions jobs, steps, and tasks (and make Matrices…
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can…
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
Hey all!
I recently published a series of articles focused on how to build a GitHub App that triggers on Org actions, like creating a new repo, how to use an AWS Lambda to access secrets and trigger GitHub Actions, and how to pass an input variable to a GitHub Action. But I realize that a lot of those very cool widgets don’t work without something that’s quite basic — passing information around inside a GitHub Action.
When a variable is set by a task (the smallest unit of work inside a GitHub Action), that variable is accessible within the same task. But the next task? Nope. It’s totally gone. So passing that var to the next step, or another job that’s potentially run on an entirely different builder? Definitely no (at least by default).
However, as you build GitHub Actions, you need to do all of this (and more). Let’s talk what GitHub recommends, and the hacky, awesome ways I personally use that works best for my Actions.
GitHub Actions: Sections
In the intro I used a lot of jargon that might not be familiar to you — Action, Job, Step, Task. Or maybe even Variable. If you know all those terms, skip to the next section! If you want to make sure you understand them, read on.
A GitHub Action is a workflow definition. It’s a yaml-encoded file that lives within the same Repo where the work is done. The logic in these files are very flexible (if a little unintuitive, especially around passing information). The Actions define all the logic that Action will use when it runs. It looks like this. You can see it defines a name for the job (line 1), a list of triggers when the job should run (line 2), and a list of jobs (line 3). Everything else is part of Job or Step syntax, that’ll talk about next.
A GitHub Action defines one or more Jobs. These Jobs are a list of Tasks to run, and can be configured to run concurrently, or in series. These Jobs can be assigned to different builders or the same builder. They define a series of Tasks that are called Steps.
Steps are individual tasks that should be executed by the Job. My example picture shows each Job with 1 Step, but usually a Job has many Steps. Note how these are run
blocks that execute a command line command (in this case on an ubuntu-latest
system, so running using bash.
Not shown — a Step
can be to instead call another Action
and run the steps there. There are lots of pre-built Actions from creators and tech companies that do complex stuff with minimal inputs — you can find them all in the GitHub Actions Marketplace. Keep in mind that not all these Actions are trustworthy — if a task is simple, you may want to build it yourself. You can also filter for those Actions from Verified Creators
, which doesn’t guarantee they’re trustworthy, but means they’re from large tech companies with code testing and bug programs to address discovered security issues.
And the very basic for programming — a variable
is a named box that can store a value. If you say FOO=BAR
that means a variable named FOO
is declared, when you say echo "$FOO is the best"
it’ll print BAR is the best
because the value of FOO
is used in place of where $FOO
is in an output.
Passing a Variable Between: Jobs
Let’s start with an example that looks like it should obviously, obviously work. In the following example we define an Action with a single job, called pass-var
, on line 4. Within it there are 2 steps — one which we set a variable (line 7), and one which we read it (line 13). These steps are ALWAYS executed on the same computer, because a Job
is assigned to a single host, and then the Steps are run in order.
If you were to run the commands in order on your linux/mac computer, they would work fine. On line 10 we’re setting a variable named ENVIRONMENT
, and then on line 11 we’re printing it to make sure it’s set correctly.
Then the next step starts, and we do a simple if
statement, where we check if the variable is blank (line 16, the -z
is short-hand logic for check if this variable is blank
). If it is, we exit.
name: Passing variable Fail | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
- name: Read deploy location | |
id: read-var | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
Let’s run the Action in the terminal and see how it goes. As you can see below, the Set deploy location
step worked perfectly — the variable value dev
is printed to the terminal.
However, the Read deploy location
step failed — it says the variable is blank! Why did that happen?
The answer is unintuitive compared to some programming platforms. Each Step is intended to be a separate box in terms of environment values. Primarily this is because you can inject “secret” or “secure” environment values into a single Step, and GitHub assumes you don’t want those secrets to leak to other Steps. This is a good architecture, but makes for an extra step to do what would normally work on your computer.
Let’s fix our Action and talk about why it works. The new logic is on line 12 — We are echo
(printing) the map ENVIRONMENT=dev
to a special GitHub “environment file”. Note the >>
which means append
(>
means overwrite).
name: Passing variable Fixed | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV | |
- name: Read deploy location | |
id: read-var | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
Now when we run it, we see the same in the first step, and the second step (and any subsequent step in the same Job) can read that environment.
Passing a Variable Between: Jobs
Okay, so that’s passing a variable between Steps — those are executed on the same computer, that’s easy peasy. What about between Jobs
— those are potentially executed on two different computers!
Let’s add an entirely new job and see if it works the same as steps (hint: No). You can see we’re doing the same thing in our first job pass-var
, starting on line 4. We only have 1 job now — it’s setting our ENVIRONMENT
variable to dev, just like before, and writing it to $GITHUB_ENV
just like before — that worked great to pass info to downstream Steps.
And now we have an entirely new Job, creatively called job2-read-var
(hey, I’m en engineer, not a fiction writer, eh?). This Job has a single task too, and all it does it check to see if the variable ENVIRONMENT
is populated.
name: Passing variable Jobs Broken | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV | |
job2-read-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Print the deploy location | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
Let’s run it and see how it goes. Well, it failed, but that’s not the interesting part! These jobs aren’t configured to depend on one another, so they run concurrently on different builders. There’s no instructions for the second job to wait for the first job to set a variable, even if it was configured correctly. So let’s add some logic so they run in serial — the first job to set the var, and then the second job to read it.
The new logic is on line 16 — needs
. Needs tells a job what to wait on before it starts. You can pass a list with square brackets, like this, if you need to wait for several jobs to finish: [pass-var, set-var, foo-bar]
. However, we have just a single job to wait on, so we just pass a string — the id
of the job. Note that this isn’t the name
of the job, it’s an id
that’s relevant within the Action context itself — what the first job sets on line 4, to uniquely identify the job.
name: Passing variable Jobs v2 (Broken) | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_ENV | |
job2-read-var: | |
runs-on: ubuntu-latest | |
needs: pass-var | |
steps: | |
- name: Print the deploy location | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
Let’s run it again now that it’s waiting and see what happens. Check this out! It still failed, but it ran in sequence, that’s awesome! We need that to happen as a precursor to passing information between the Jobs — otherwise job2 starts right away before job1 can do anything.
To fix this, we need to implement an “output” from the first job and then we can reference it in the second job. There’s a surprising amount of changes we need to make in order for this to work, let’s go over them.
First, on line 14, we need to send the output to an entirely new place — the $GITHUB_OUTPUT
file, rather than the $GITHUB_ENV
file, like this:
echo "ENVIRONMENT=$ENVIRONMENT" >> $GITHUB_OUTPUT
However, that means that the variable wouldn’t be available for subsequent Steps in the first Job, which is annoying. Bash itself offers a fix, the command tee
that can write piped output to multiple places. We tee -a
(which means write this output in “append” (-a) mode) to several places. I’m a huge fan of this, because it keeps you from writing duplicate output
lines.
Next, we need to create an output
from the first job, using the syntax on line 6–7. This means to make this output available to other jobs within the same Actions run
.
Then we need to update the job2-read-var
Step with a special config to be able to access the output using GitHub’s syntax ${{ (stuff) }}
. You can see an example on line 20-21. This sets an environmental variable within only that Step. If you wanted the value to be available for subsequent Steps in the same Job, you’d output that value to the $GITHUB_ENV
just like our first example.
name: Passing variable Jobs v3 (Fixed) | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
outputs: | |
ENVIRONMENT: ${{ steps.set-var.outputs.ENVIRONMENT }} | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
echo "ENVIRONMENT=$ENVIRONMENT" | tee -a $GITHUB_ENV $GITHUB_OUTPUT | |
job2-read-var: | |
runs-on: ubuntu-latest | |
needs: pass-var | |
steps: | |
- name: Print the deploy location | |
env: | |
ENVIRONMENT: ${{ needs.pass-var.outputs.ENVIRONMENT }} | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
Passing a (Variable) File Between: Jobs
You’re also able to pass a binary file between Jobs. This works by default between Steps because they operate on the same file system, so we’ll skip that use case. However, Jobs are on different computers, so passing a file between them is super useful.
You can pass any binary file, but you can also pass a file that stored the environment values map — this can be super useful when the environment values are dynamic somehow — maybe they are populated with different names, or with different prefixes or suffixes. Or just because I find this method quite a bit cleaner than using the complex Step Output → Job Output → Task Input model we just talked about.
Let’s update our Action file and then walk through what we updated. New (working) file follows. First, on line 12, note how we’re writing our ENVIRONMENT
variable to both $GITHUB_ENV
(subsequent Steps in the same job) and also to a file called env.vars
. The name of this file is totally arbitrary, and it’s a literal file on your builder’s disk. You can write n
variables to this file — as long as you’re appending (tee -a), go nuts.
Then we do something neat — we call an action called actions/upload-artifact@3
from the GitHub marketplace. This is one from GitHub itself (they publish under the name actions
), and it uploads a file as an artifact
. Artifacts have a special meaning in CI/CDs — they are compiled or otherwise binary files that are built by automation and accessible to downstream automation. We tell it to keep this file for 365 days (line 19) and store only a single file — env.vars
. Note line 18 — the name of this environment cache is the github_run_id
, which is a unique number from the GitHub context to identify this run
of the Action. Any downstream Jobs will have the same github.run_id
, which is valuable for Action Run-specific files, like variables and binaries.
name: Passing File Jobs v1 | |
on: workflow_dispatch | |
jobs: | |
pass-var: | |
runs-on: ubuntu-latest | |
steps: | |
- name: Set deploy location | |
id: set-var | |
run: | | |
ENVIRONMENT=dev | |
echo "The ENVIRONMENT var is: $ENVIRONMENT" | |
echo "ENVIRONMENT=$ENVIRONMENT" | tee -a $GITHUB_ENV env.vars | |
- name: Cache Envs | |
id: cache-envs | |
uses: actions/upload-artifact@v3 | |
with: | |
name: env-cache-${{ github.run_id }} | |
retention-days: 365 | |
path: env.vars |
Job 1 ends, and Job 2 starts — the first thing it does is download the environment file (line 5). We pass it the same name as was used to store the file. Then on line 11 we read
the file, which means to cat
the file (read) and append that information to the $GITHUB_ENV
special file. That means those environment variables are available to any subsequent Step in this Job.
job2-read-var: | |
runs-on: ubuntu-latest | |
needs: pass-var | |
steps: | |
- name: Download Env Vars | |
id: download-env-vars | |
uses: actions/download-artifact@v3 | |
with: | |
name: env-cache-${{ github.run_id }} | |
- name: Read Env Vars | |
id: read-env-vars | |
run: | | |
cat env.vars >> $GITHUB_ENV | |
- name: Print the deploy location | |
run: | | |
if [ -z "$ENVIRONMENT" ]; then | |
echo "No ENVIRONMENT var set, exiting" | |
exit 1 | |
else | |
echo "The ENVIRONMENT var is set to: $ENVIRONMENT" | |
fi |
And check that out, all happy! And note something else cool with this model — the artifact
is stored on the Action run page in the GitHub UI, and it’s available for downloading. This can be really useful for traceability with large, complex actions. Just make sure that you’re not writing sensitive variables here — they’re all readable to most folks inside the GitHub UI.
Set Job 2’s Environment Inside Job 1
Something cool that I’ve just started to play with and learn is to have a job itself set the run context of other jobs. For instance, the Action file itself sets some things, like the environment
utilized to run job2 — batch-deploy
.
However, we can use the same output
syntax that we used earlier to set that value. That means the context of the second job isn’t calculated until job 1 runs, and can change based on all sorts of stuff you read from inside your Actions. That’s hugely powerful, and this very simple example doesn’t really show it off.
name: Test passing environment | |
on: workflow_dispatch | |
jobs: | |
set-environment: | |
runs-on: ubuntu-latest | |
outputs: | |
ENVIRONMENT: ${{ steps.set-environment-task.outputs.ENVIRONMENT }} | |
steps: | |
- name: Define deploy locations | |
id: set-environment-task | |
run: | | |
env="dev" | |
echo "ENVIRONMENT=$env" >> $GITHUB_OUTPUT | |
batch-deploy: | |
needs: set-environment | |
runs-on: ubuntu-latest | |
environment: ${{ needs.set-environment.outputs.ENVIRONMENT }} | |
steps: | |
- name: Deploy to environment | |
run: echo "hi mom" |
Set Job2’s Matrix from Within Job1
Let’s show off some really cool stuff — GitHub Actions support a concept called a Matrix, that lets several jobs run concurrently — it’s really useful for software building — say you want to build different versions of Chrome on different Operating Systems — matrix’s are perfect for that. I’ve used them primarily for Terraform validation across different workspaces.
But what if that Matrix of jobs was dynamic, and could be configured by your own scripts in Job1 — well, it can.
This is clearly a static example — we’re just setting variables (note the single quotes within the double quotes) to static values, but you could compute them however you want!
name: Test passing matrix | |
on: | |
- workflow_dispatch | |
jobs: | |
set-environment: | |
runs-on: ubuntu-latest | |
outputs: | |
ENVIRONMENT: ${{ steps.set-environment-task.outputs.ENVIRONMENT }} | |
steps: | |
- uses: actions/checkout@v3 | |
- name: Define deploy locations | |
id: set-environment-task | |
run: | | |
env1="'env1'" | |
env2="'env2'" | |
env3="'env3'" | |
echo "ENVIRONMENT=[$env1, $env2, $env3]" >> $GITHUB_OUTPUT | |
batch-deploy: | |
needs: set-environment | |
runs-on: ubuntu-latest | |
strategy: | |
matrix: | |
environment: ${{ fromJSON(needs.set-environment.outputs.ENVIRONMENT) }} | |
steps: | |
- name: Deploy to environment | |
run: echo "hi mom" |
Here’s the matrix build. The second Job is executed 3 times, once for each “environment” value we passed to it from job 1.
The Most Dynamic — On Demand Matrix
But remember, this builder has access to your whole Repo’s filesystem, so it can read the files in your directories. Say you have some number of data files — you can have 1 or 100, and you want each one to be built one time, and you want it to happen AS FAST AS POSSIBLE.
Well, you can construct an arbitrary matrix of jobs based on that list of files, see line 15,18, and then write it as an output.
Then the next job will read that matrix, and execute that number of data files concurrently
. Note that line 32 runs a command, build.sh
and then injects a value from the matrix
context — for that run of the Matrix, it’s set to the single environment you want to build.
name: Build and pass dynamic matrix | |
on: workflow_dispatch | |
jobs: | |
set-environment: | |
runs-on: ubuntu-latest | |
outputs: | |
CONFIG_LIST: ${{ steps.set-environment-task.outputs.CONFIG_LIST }} | |
steps: | |
- uses: actions/checkout@v3 | |
- name: Define deploy locations | |
id: set-environment-task | |
run: | | |
# Read pods from a directory | |
ALL_DATA_FILES=$(ls -l data/ | grep -Ev 'total' | rev | cut -d " " -f1 | rev | cut -d "." -f 1) | |
# Format environments list | |
CONFIG_LIST=$(echo "$ALL_DATA_FILES" | awk 'NF' | sed "s/^/'/g" | sed "s/$/'/g" | tr '\n' ', ' | sed 's/,$//' | sed 's/^/[/g' | sed 's/$/]/g' | sed 's/,/, /g') | |
# Write output | |
echo "CONFIG_LIST=$CONFIG_LIST" >> $GITHUB_OUTPUT | |
batch-deploy: | |
needs: set-environment | |
runs-on: ubuntu-latest | |
strategy: | |
matrix: | |
environment: ${{ fromJSON(needs.set-environment.outputs.CONFIG_LIST) }} | |
steps: | |
- name: Build it | |
run: | | |
build.sh ${{ matrix.environment }} |
In this way you can have a massively dynamic Actions infrastructure. 🔥
Summary
In this write-up we went over how GitHub sandboxes Steps by default, and doesn’t allow environment values to pass between them. Then we talked about how to pass information anyway between:
Steps in a Job
Jobs in an Action — using both
outputs
andfiles
We also talked about how we can influence how Actions themselves run, for instance by passing:
A string value from Job 1 to Job 2 to set an environment
A static matrix to Job 2
A massively dynamic matrix to Job 2
I hope this makes your Actions work better, and maybe even make them more dynamic! Examples of cool stuff you built is welcome :)
Good luck out there!
kyler