GitHub Actions is the most common CI/CD system running on the internet today, and for the majority of small-to-mid teams it is good enough that it will never need to be replaced. But “good enough” is not the same as “well set up”. Most teams’ deploy workflows are stitched-together snippets — a checkout, a setup-node, an npm ci, a scp, a systemctl restart — that work on the happy path and become difficult to debug the moment something goes sideways.
This post is the deploy workflow I’d want every team to start with: fast, safe, observable, recoverable. It covers the mental model, the concrete patterns, the security knobs that actually matter (OIDC, environments, protected rules), and the handful of anti-patterns that catch every team once.
TL;DR
- Workflows are triggered events running jobs (in parallel by default), each made of steps on a runner.
- The three production-critical patterns: build-in-CI → ship-artifacts, OIDC over long-lived secrets, Environments with required reviewers.
- Cache aggressively (
actions/cache, setup-* built-in caching) — most deploy pipelines spend 60%+ of their time on repeatable work.- Concurrency groups prevent two deploys from racing. Treat deploy as a singleton.
- Rollback before you ever deploy — have the “previous version” button wired before the first release.
- Force-push, ignore status checks, and self-hosted runners on the public internet are the three ways teams accidentally give away prod.
The Mental Model
Before the YAML, four concepts that explain almost everything else.
| Concept | What it is |
|---|---|
| Workflow | A YAML file in .github/workflows/. Has triggers and one or more jobs. |
| Job | A unit of work that runs on a single runner. Jobs in the same workflow run in parallel by default; use needs: to chain them. |
| Step | A shell command or a reusable action (a unit of logic someone else published). |
| Runner | The machine that executes a job. GitHub-hosted (ubuntu-latest, macos-latest, windows-latest) or self-hosted. |
And three more that matter for production workflows:
| Concept | What it is |
|---|---|
| Environment | A named deployment target (e.g. production, staging) with its own secrets, required reviewers, protection rules. |
| OIDC | GitHub issues a short-lived identity token to your workflow. Cloud providers trust the issuer, so you never need a long-lived access key in your secrets. |
| Concurrency | Group of workflows that cannot run simultaneously. Use it to serialise deploys. |
A Minimal Deploy Workflow
The shortest useful template. Triggered on push to main, runs tests, builds, deploys.
name: Deploy
on:
push:
branches: [main]
workflow_dispatch: # manual trigger from the Actions tab
# Serialise deploys — never two at once on the same env
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: false
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm test -- --run
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
retention-days: 7
deploy:
needs: build
runs-on: ubuntu-latest
environment: production # enforces whatever protection rules are set on the env
steps:
- uses: actions/download-artifact@v4
with:
name: build-output
path: dist/
- name: Deploy via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
script: |
set -euo pipefail
cd /var/www/app
# ... receive artifact, swap atomically, restart service ...
This is already better than most production setups:
- Jobs are split (
test → build → deploy) so failures fail fast and the build isn’t wasted on a broken test. - Artifacts pass cleanly between jobs (no building twice).
- Concurrency prevents two deploys colliding on the same branch.
- Environment routes through protection rules — optional reviewers, wait timer, whatever you configure on the env.
Build in CI, Ship Artifacts to Prod
The single biggest win most teams miss. Building on the production host works for a month — until the host runs out of memory on a large TypeScript project, or a transient npm error leaves the site half-deployed, or a build step unintentionally reads production env vars.
The clean pattern:
| Bad | Good | |
|---|---|---|
| Where the build runs | On the prod server via SSH | On the CI runner |
| What gets shipped | Source + npm install --omit=dev | Pre-built artifact (dist/, .next/, standalone build, Docker image) |
| Prod server’s job | Compile + restart | Swap and restart |
| Prod server’s RAM | Must fit the build | Just needs to run the binary |
# Build once on the runner
- name: Build
run: npm run build
# Compress
- run: tar -czf dist.tar.gz dist/
# Ship to the server (rsync or scp)
- name: Upload
uses: appleboy/scp-action@v0.1.7
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
source: "dist.tar.gz"
target: "/tmp"
# Swap + restart
- name: Activate
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
script: |
set -euo pipefail
tar -xzf /tmp/dist.tar.gz -C /var/www/app/
sudo systemctl restart my-app
rm /tmp/dist.tar.gz
Secrets: OIDC Beats Long-Lived Keys
Long-lived access keys stored in GitHub Secrets are fine, but they rot. Developers leave, keys get leaked, rotation is manual, and if someone runs echo $AWS_SECRET in a workflow debug step, it’s in the logs forever.
OIDC (OpenID Connect) solves this: GitHub mints a short-lived token for each workflow run, and your cloud provider trusts GitHub as an identity provider. No static secret.
# AWS with OIDC
permissions:
id-token: write # required for OIDC
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::1234567890:role/github-deploy
aws-region: us-east-1
# Now standard AWS CLI / SDK calls authenticate automatically
- run: aws s3 sync dist/ s3://my-bucket --delete
The IAM role trust policy names GitHub’s OIDC provider and restricts to your specific repo/branch. A leaked workflow can’t mint a token for your role.
GCP, Azure, and Vercel all support OIDC from GitHub Actions. Use it. The setup cost is one afternoon; the maintenance cost is zero.
| Provider | Action to use |
|---|---|
| AWS | aws-actions/configure-aws-credentials |
| GCP | google-github-actions/auth |
| Azure | azure/login with client-id (no client-secret) |
| Cloudflare | (still API token at time of writing) |
| Vercel | OIDC for AWS/GCP within your deploy; platform itself uses project tokens |
When OIDC isn’t an option, use Environment secrets over Repo secrets: they’re scoped to the env (prod vs staging), and access is recorded in the deployment log.
Environments: Where Safety Lives
Environments are underrated. They give you, for free:
- Separate secrets per env. The
productiondeploy can’t accidentally grab a staging URL. - Required reviewers. The prod env refuses to start until N specific people click approve.
- Wait timers. “Prod deploy sits for 10 minutes before starting” gives you a window to cancel.
- Branch rules. The prod env will only run from
main. - Deployment history. Every deploy is a first-class event you can inspect, rerun, or roll back from.
jobs:
deploy-staging:
environment: staging
# ... deploy to staging ...
deploy-production:
needs: deploy-staging
environment:
name: production
url: https://example.com # shown in deployment UI
# ... deploy to prod ...
Then in the repo → Settings → Environments, configure the prod env with at least: required reviewers, allowed branch main, secrets scoped to prod. You now have a half-decent change-management system without any other tool.
Caching: Most Workflows Are 60% Repeatable
The built-in cache: option in most setup actions handles 90% of cases.
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm # caches ~/.npm based on package-lock.json
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip # caches pip downloads
For anything else — Docker layer cache, Next.js .next/cache, Turbo cache, Jest cache — use actions/cache:
- uses: actions/cache@v4
with:
path: |
.next/cache
node_modules/.cache
key: ${{ runner.os }}-next-${{ hashFiles('**/package-lock.json') }}-${{ hashFiles('src/**') }}
restore-keys: |
${{ runner.os }}-next-${{ hashFiles('**/package-lock.json') }}-
${{ runner.os }}-next-
The restore-keys fallback chain is important: if the exact key isn’t found, use the next best match. This gives you partial cache hits when dependencies change but your source mostly hasn’t.
Don’t cache node_modules directly. Cache the package manager’s download cache (~/.npm, ~/.pnpm-store) and re-run the install. That way npm ci still runs but finishes in seconds.
Concurrency: Don’t Race Yourself
Without a concurrency block, a flurry of pushes to main triggers a flurry of deploy workflows — and they’ll race to write to the same production server.
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: false # queue, don't cancel
Two real settings worth understanding:
| Setting | Effect |
|---|---|
cancel-in-progress: true | New run cancels the one in progress. Good for PR checks where only the latest commit matters. |
cancel-in-progress: false | New run waits for the current one to finish. Good for deploys where you want every commit to actually ship (in order). |
The group key is a string — use deploy-${{ github.ref }} to serialise per branch, or deploy-production to serialise across all branches that deploy there.
Matrix Builds: Parallel by Default
When your pipeline naturally has multiple variants (Node versions, OSes, app folders in a monorepo), use a matrix.
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node: [20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
cache: npm
- run: npm ci
- run: npm test
fail-fast: false keeps the other cells running when one fails — valuable for flaky tests or cross-platform bugs.
For monorepo deploys, matrix across the apps:
jobs:
detect-changes:
# ... outputs an array of apps that changed ...
deploy:
needs: detect-changes
strategy:
matrix:
app: ${{ fromJson(needs.detect-changes.outputs.apps) }}
steps:
- run: ./scripts/deploy.sh ${{ matrix.app }}
Combined with path filters on the trigger (on.push.paths), this means a change in apps/api/** triggers only the API deploy, not everything.
Deployment Strategies
Three strategies most teams want eventually. GitHub Actions doesn’t implement these itself — but it orchestrates them.
| Strategy | What it is | How Actions does it |
|---|---|---|
| Rolling | Replace instances one at a time. | matrix over instances, with small batch size + health checks between. |
| Blue/Green | Run two prod stacks; cut traffic to the new one after smoke tests pass. | Two environments (blue, green); a “swap” step flips the DNS/load-balancer target. |
| Canary | Route a small % of traffic to the new version; graduate if metrics are good. | Deploy action marks new version; a monitored step (or external tool) graduates or rolls back. |
A blue/green example sketched out:
jobs:
deploy-to-inactive:
runs-on: ubuntu-latest
steps:
- name: Determine inactive color
id: color
run: |
ACTIVE=$(./scripts/get-active-color.sh) # 'blue' or 'green'
echo "inactive=$([ "$ACTIVE" = "blue" ] && echo green || echo blue)" >> $GITHUB_OUTPUT
- name: Deploy to inactive
run: ./scripts/deploy.sh ${{ steps.color.outputs.inactive }}
- name: Smoke test inactive
run: ./scripts/smoke.sh https://${{ steps.color.outputs.inactive }}.example.com
- name: Swap LB
run: ./scripts/swap-lb.sh ${{ steps.color.outputs.inactive }}
The swap step is what makes it a real blue/green: until that step runs, old traffic still flows to the active side, and a failed smoke test never affects users.
Rollback — Wire It Before the First Deploy
The single most common reason “we can’t rollback” is that no one planned for it. The deploy shipped is implicit (git commit, npm build output); the previous version is not addressable.
Rule: every deploy produces an immutable, re-deployable artifact — with the git SHA or version tag as the primary key.
- Container images: tag with
:${{ github.sha }}(never:latestfor prod), push to registry, never overwrite. - Static sites: upload to a versioned path (
s3://bucket/releases/${{ github.sha }}/) and update the active symlink/alias. - VPS tarballs: name
app-${{ github.sha }}.tar.gz, keep last 10 on the server.
Rollback then becomes a manual workflow that takes a version and points prod at it:
name: Rollback
on:
workflow_dispatch:
inputs:
version:
description: Git SHA or tag to roll back to
required: true
type: string
jobs:
rollback:
runs-on: ubuntu-latest
environment: production
steps:
- run: ./scripts/activate-version.sh ${{ inputs.version }}
No rebuild, no git revert, no drama — just swap the active version. This is the workflow you want to have and never need.
Observability: You Will Regret Flying Blind
Two observability things matter in the workflow itself:
- Write the deploy events to your observability platform. Datadog, Honeycomb, Grafana Cloud — whatever you use. A deploy marker on a latency graph has saved more debugging sessions than any post-mortem.
- Set up job-failure notifications. Default is email; most teams want Slack. The
slackapi/slack-github-actionposts to a channel on failure.
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "🚨 Deploy failed on ${{ github.ref_name }} (${{ github.sha }})",
"blocks": [ ... ]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
Also worth knowing: $GITHUB_STEP_SUMMARY lets you write rich Markdown to the job summary page. Useful for deploy timings, artifact sizes, release notes auto-generated from commits.
- name: Summary
run: |
echo "## Deploy summary" >> $GITHUB_STEP_SUMMARY
echo "- Version: \`${{ github.sha }}\`" >> $GITHUB_STEP_SUMMARY
echo "- Duration: $(date -d@$SECONDS -u +%M:%S)" >> $GITHUB_STEP_SUMMARY
Common Anti-Patterns (and the Fix)
| Anti-pattern | Why it bites | Fix |
|---|---|---|
| Building on the prod server | Runs out of RAM / CPU, affects traffic, slow deploys. | Build on the runner, ship the artifact. |
| Long-lived cloud access keys in secrets | Rotation is manual; leaks are permanent. | OIDC. |
uses: actions/checkout@master or @main | Breaks silently when upstream refactors. | Pin to a version tag (@v4) or a SHA for higher-security paths. |
| Shared secret for staging + prod | A compromise in either is a compromise in both. | Environment-scoped secrets. |
continue-on-error: true to “ship anyway” | Hides real failures; gives teams a false sense of CI coverage. | Fix the test, or mark it as expected-to-fail with an issue number. |
| Deploy from any branch | Developers accidentally push prod at 2am. | Environment → Deployment branches → main only. |
| No concurrency group | Two deploys race; last-write-wins on prod. | Concurrency block. |
| Self-hosted runners on open internet | Runs untrusted PR code; attacker pivots to your network. | Private/ephemeral runners, and never run untrusted PRs on self-hosted. |
| Logging secrets | echo $TOKEN leaks to CI logs. | Never echo secrets; they’re masked but only when GitHub recognises them. |
Real-World Shape: SSH-to-VPS Deploy
A complete SSH deploy workflow for a Node.js/Next.js app (loosely based on the one running palakorn.com):
name: Deploy
on:
push:
branches: [main]
paths: ['apps/admin/**']
workflow_dispatch:
concurrency:
group: deploy-admin
cancel-in-progress: false
jobs:
deploy:
runs-on: ubuntu-latest
environment:
name: production
url: https://example.com/admin
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
cache-dependency-path: apps/admin/package-lock.json
- name: Install, build
working-directory: apps/admin
run: |
npm ci
npm run db:generate
npm run build
- name: Package artifact
run: tar -czf admin.tar.gz -C apps/admin .next package.json package-lock.json prisma public
- name: Upload artifact
uses: appleboy/scp-action@v0.1.7
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
port: ${{ secrets.DEPLOY_SSH_PORT }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
source: "admin.tar.gz"
target: /tmp
- name: Activate
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
port: ${{ secrets.DEPLOY_SSH_PORT }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
script: |
set -euo pipefail
TARGET=/var/www/apps/admin
tar -xzf /tmp/admin.tar.gz -C "$TARGET"
cd "$TARGET" && npm ci --omit=dev && npx prisma db push --accept-data-loss
sudo systemctl restart app-admin
rm /tmp/admin.tar.gz
curl -sf http://127.0.0.1:3003/admin > /dev/null # smoke
Notes:
- Path filter
apps/admin/**means unrelated changes don’t trigger this deploy. - Concurrency group is per-app (
deploy-admin), so a web deploy and admin deploy can still run in parallel. - Environment URL makes the deployment page show the live URL with a one-click “view”.
- Smoke check at the end fails the deploy if the service didn’t come back healthy, so you find out immediately, not from your users.
Closing Checklist
Before declaring your deploy workflow “done”:
- Triggered by
pushtomain+workflow_dispatch(manual) - Jobs split: test / build / deploy; deploy only runs if tests pass
- Build on the runner, ship an immutable artifact
- Concurrency group set — two deploys never race
- Environment configured with required reviewers +
main-only branch rule - Secrets scoped to environment, not repo-wide
- OIDC used where the cloud provider supports it
- Version artifact with
github.shaor a tag — never overwrite - Rollback workflow exists as a separate
workflow_dispatchjob - Cache the package-manager store (not
node_modules) - Smoke check at the end of deploy — fail fast on bad releases
- Notifications on failure (Slack/email) with commit SHA + run link
- Actions pinned to version tags, not
@main
Further Reading
- GitHub Docs — Workflow syntax — canonical reference; keep it in a tab.
- GitHub Docs — OIDC with AWS / GCP / Azure — setup guides for the major clouds.
- GitHub Actions Toolkit — the node libs actions are built from; useful when writing custom actions.
- Act — run Actions locally. Invaluable for debugging.
- The Unicorn Project — Gene Kim (novel). Not about Actions specifically, but the best framing I know for why deploy automation matters.
The goal of this setup is not to ship more often — it’s to make shipping so cheap and so boring that teams stop treating a deploy as an event. When deploys are boring, you deploy small, you deploy often, and you spend the time you saved on work that actually moves the product.