GitHub Actions is the most common CI/CD system running on the internet today, and for the majority of small-to-mid teams it is good enough that it will never need to be replaced. But “good enough” is not the same as “well set up”. Most teams’ deploy workflows are stitched-together snippets — a checkout, a setup-node, an npm ci, a scp, a systemctl restart — that work on the happy path and become difficult to debug the moment something goes sideways.

This post is the deploy workflow I’d want every team to start with: fast, safe, observable, recoverable. It covers the mental model, the concrete patterns, the security knobs that actually matter (OIDC, environments, protected rules), and the handful of anti-patterns that catch every team once.

TL;DR

Workflows are triggered events running jobs (in parallel by default), each made of steps on a runner.

The three production-critical patterns: build-in-CI → ship-artifacts, OIDC over long-lived secrets, Environments with required reviewers.

Cache aggressively (actions/cache, setup-* built-in caching) — most deploy pipelines spend 60%+ of their time on repeatable work.

Concurrency groups prevent two deploys from racing. Treat deploy as a singleton.

Rollback before you ever deploy — have the “previous version” button wired before the first release.

Force-push, ignore status checks, and self-hosted runners on the public internet are the three ways teams accidentally give away prod.

The Mental Model

Before the YAML, four concepts that explain almost everything else.

Concept	What it is
Workflow	A YAML file in `.github/workflows/`. Has triggers and one or more jobs.
Job	A unit of work that runs on a single runner. Jobs in the same workflow run in parallel by default; use `needs:` to chain them.
Step	A shell command or a reusable action (a unit of logic someone else published).
Runner	The machine that executes a job. GitHub-hosted (ubuntu-latest, macos-latest, windows-latest) or self-hosted.

And three more that matter for production workflows:

Concept	What it is
Environment	A named deployment target (e.g. `production`, `staging`) with its own secrets, required reviewers, protection rules.
OIDC	GitHub issues a short-lived identity token to your workflow. Cloud providers trust the issuer, so you never need a long-lived access key in your secrets.
Concurrency	Group of workflows that cannot run simultaneously. Use it to serialise deploys.

A Minimal Deploy Workflow

The shortest useful template. Triggered on push to main, runs tests, builds, deploys.

name: Deploy

on:
  push:
    branches: [main]
  workflow_dispatch: # manual trigger from the Actions tab

# Serialise deploys — never two at once on the same env
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: false

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm test -- --run

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: dist/
          retention-days: 7

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment: production    # enforces whatever protection rules are set on the env
    steps:
      - uses: actions/download-artifact@v4
        with:
          name: build-output
          path: dist/
      - name: Deploy via SSH
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.DEPLOY_SSH_KEY }}
          script: |
            set -euo pipefail
            cd /var/www/app
            # ... receive artifact, swap atomically, restart service ...

This is already better than most production setups:

Jobs are split (test → build → deploy) so failures fail fast and the build isn’t wasted on a broken test.
Artifacts pass cleanly between jobs (no building twice).
Concurrency prevents two deploys colliding on the same branch.
Environment routes through protection rules — optional reviewers, wait timer, whatever you configure on the env.

Build in CI, Ship Artifacts to Prod

The single biggest win most teams miss. Building on the production host works for a month — until the host runs out of memory on a large TypeScript project, or a transient npm error leaves the site half-deployed, or a build step unintentionally reads production env vars.

The clean pattern:

	Bad	Good
Where the build runs	On the prod server via SSH	On the CI runner
What gets shipped	Source + `npm install --omit=dev`	Pre-built artifact (dist/, .next/, standalone build, Docker image)
Prod server’s job	Compile + restart	Swap and restart
Prod server’s RAM	Must fit the build	Just needs to run the binary

# Build once on the runner
- name: Build
  run: npm run build
# Compress
- run: tar -czf dist.tar.gz dist/
# Ship to the server (rsync or scp)
- name: Upload
  uses: appleboy/scp-action@v0.1.7
  with:
    host: ${{ secrets.DEPLOY_HOST }}
    username: ${{ secrets.DEPLOY_USER }}
    key: ${{ secrets.DEPLOY_SSH_KEY }}
    source: "dist.tar.gz"
    target: "/tmp"
# Swap + restart
- name: Activate
  uses: appleboy/ssh-action@v1
  with:
    host: ${{ secrets.DEPLOY_HOST }}
    username: ${{ secrets.DEPLOY_USER }}
    key: ${{ secrets.DEPLOY_SSH_KEY }}
    script: |
      set -euo pipefail
      tar -xzf /tmp/dist.tar.gz -C /var/www/app/
      sudo systemctl restart my-app
      rm /tmp/dist.tar.gz

Secrets: OIDC Beats Long-Lived Keys

Long-lived access keys stored in GitHub Secrets are fine, but they rot. Developers leave, keys get leaked, rotation is manual, and if someone runs echo $AWS_SECRET in a workflow debug step, it’s in the logs forever.

OIDC (OpenID Connect) solves this: GitHub mints a short-lived token for each workflow run, and your cloud provider trusts GitHub as an identity provider. No static secret.

# AWS with OIDC
permissions:
  id-token: write   # required for OIDC
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::1234567890:role/github-deploy
          aws-region: us-east-1
      # Now standard AWS CLI / SDK calls authenticate automatically
      - run: aws s3 sync dist/ s3://my-bucket --delete

The IAM role trust policy names GitHub’s OIDC provider and restricts to your specific repo/branch. A leaked workflow can’t mint a token for your role.

GCP, Azure, and Vercel all support OIDC from GitHub Actions. Use it. The setup cost is one afternoon; the maintenance cost is zero.

Provider	Action to use
AWS	`aws-actions/configure-aws-credentials`
GCP	`google-github-actions/auth`
Azure	`azure/login` with `client-id` (no `client-secret`)
Cloudflare	(still API token at time of writing)
Vercel	OIDC for AWS/GCP within your deploy; platform itself uses project tokens

When OIDC isn’t an option, use Environment secrets over Repo secrets: they’re scoped to the env (prod vs staging), and access is recorded in the deployment log.

Environments: Where Safety Lives

Environments are underrated. They give you, for free:

Separate secrets per env. The production deploy can’t accidentally grab a staging URL.
Required reviewers. The prod env refuses to start until N specific people click approve.
Wait timers. “Prod deploy sits for 10 minutes before starting” gives you a window to cancel.
Branch rules. The prod env will only run from main.
Deployment history. Every deploy is a first-class event you can inspect, rerun, or roll back from.

jobs:
  deploy-staging:
    environment: staging
    # ... deploy to staging ...
  deploy-production:
    needs: deploy-staging
    environment:
      name: production
      url: https://example.com   # shown in deployment UI
    # ... deploy to prod ...

Then in the repo → Settings → Environments, configure the prod env with at least: required reviewers, allowed branch main, secrets scoped to prod. You now have a half-decent change-management system without any other tool.

Caching: Most Workflows Are 60% Repeatable

The built-in cache: option in most setup actions handles 90% of cases.

- uses: actions/setup-node@v4
  with:
    node-version: 22
    cache: npm            # caches ~/.npm based on package-lock.json

- uses: actions/setup-python@v5
  with:
    python-version: "3.12"
    cache: pip            # caches pip downloads

For anything else — Docker layer cache, Next.js .next/cache, Turbo cache, Jest cache — use actions/cache:

- uses: actions/cache@v4
  with:
    path: |
      .next/cache
      node_modules/.cache
    key: ${{ runner.os }}-next-${{ hashFiles('**/package-lock.json') }}-${{ hashFiles('src/**') }}
    restore-keys: |
      ${{ runner.os }}-next-${{ hashFiles('**/package-lock.json') }}-
      ${{ runner.os }}-next-

The restore-keys fallback chain is important: if the exact key isn’t found, use the next best match. This gives you partial cache hits when dependencies change but your source mostly hasn’t.

Don’t cache node_modules directly. Cache the package manager’s download cache (~/.npm, ~/.pnpm-store) and re-run the install. That way npm ci still runs but finishes in seconds.

Concurrency: Don’t Race Yourself

Without a concurrency block, a flurry of pushes to main triggers a flurry of deploy workflows — and they’ll race to write to the same production server.

concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: false   # queue, don't cancel

Two real settings worth understanding:

Setting	Effect
`cancel-in-progress: true`	New run cancels the one in progress. Good for PR checks where only the latest commit matters.
`cancel-in-progress: false`	New run waits for the current one to finish. Good for deploys where you want every commit to actually ship (in order).

The group key is a string — use deploy-${{ github.ref }} to serialise per branch, or deploy-production to serialise across all branches that deploy there.

Matrix Builds: Parallel by Default

When your pipeline naturally has multiple variants (Node versions, OSes, app folders in a monorepo), use a matrix.

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        node: [20, 22]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
          cache: npm
      - run: npm ci
      - run: npm test

fail-fast: false keeps the other cells running when one fails — valuable for flaky tests or cross-platform bugs.

For monorepo deploys, matrix across the apps:

jobs:
  detect-changes:
    # ... outputs an array of apps that changed ...

  deploy:
    needs: detect-changes
    strategy:
      matrix:
        app: ${{ fromJson(needs.detect-changes.outputs.apps) }}
    steps:
      - run: ./scripts/deploy.sh ${{ matrix.app }}

Combined with path filters on the trigger (on.push.paths), this means a change in apps/api/** triggers only the API deploy, not everything.

Deployment Strategies

Three strategies most teams want eventually. GitHub Actions doesn’t implement these itself — but it orchestrates them.

Strategy	What it is	How Actions does it
Rolling	Replace instances one at a time.	`matrix` over instances, with small batch size + health checks between.
Blue/Green	Run two prod stacks; cut traffic to the new one after smoke tests pass.	Two environments (`blue`, `green`); a “swap” step flips the DNS/load-balancer target.
Canary	Route a small % of traffic to the new version; graduate if metrics are good.	Deploy action marks new version; a monitored step (or external tool) graduates or rolls back.

A blue/green example sketched out:

jobs:
  deploy-to-inactive:
    runs-on: ubuntu-latest
    steps:
      - name: Determine inactive color
        id: color
        run: |
          ACTIVE=$(./scripts/get-active-color.sh)   # 'blue' or 'green'
          echo "inactive=$([ "$ACTIVE" = "blue" ] && echo green || echo blue)" >> $GITHUB_OUTPUT
      - name: Deploy to inactive
        run: ./scripts/deploy.sh ${{ steps.color.outputs.inactive }}
      - name: Smoke test inactive
        run: ./scripts/smoke.sh https://${{ steps.color.outputs.inactive }}.example.com
      - name: Swap LB
        run: ./scripts/swap-lb.sh ${{ steps.color.outputs.inactive }}

The swap step is what makes it a real blue/green: until that step runs, old traffic still flows to the active side, and a failed smoke test never affects users.

Rollback — Wire It Before the First Deploy

The single most common reason “we can’t rollback” is that no one planned for it. The deploy shipped is implicit (git commit, npm build output); the previous version is not addressable.

Rule: every deploy produces an immutable, re-deployable artifact — with the git SHA or version tag as the primary key.

Container images: tag with :${{ github.sha }} (never :latest for prod), push to registry, never overwrite.
Static sites: upload to a versioned path (s3://bucket/releases/${{ github.sha }}/) and update the active symlink/alias.
VPS tarballs: name app-${{ github.sha }}.tar.gz, keep last 10 on the server.

Rollback then becomes a manual workflow that takes a version and points prod at it:

name: Rollback

on:
  workflow_dispatch:
    inputs:
      version:
        description: Git SHA or tag to roll back to
        required: true
        type: string

jobs:
  rollback:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: ./scripts/activate-version.sh ${{ inputs.version }}

No rebuild, no git revert, no drama — just swap the active version. This is the workflow you want to have and never need.

Two observability things matter in the workflow itself:

Write the deploy events to your observability platform. Datadog, Honeycomb, Grafana Cloud — whatever you use. A deploy marker on a latency graph has saved more debugging sessions than any post-mortem.
Set up job-failure notifications. Default is email; most teams want Slack. The slackapi/slack-github-action posts to a channel on failure.

- name: Notify on failure
  if: failure()
  uses: slackapi/slack-github-action@v1
  with:
    payload: |
      {
        "text": "🚨 Deploy failed on ${{ github.ref_name }} (${{ github.sha }})",
        "blocks": [ ... ]
      }
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Also worth knowing: $GITHUB_STEP_SUMMARY lets you write rich Markdown to the job summary page. Useful for deploy timings, artifact sizes, release notes auto-generated from commits.

- name: Summary
  run: |
    echo "## Deploy summary" >> $GITHUB_STEP_SUMMARY
    echo "- Version: \`${{ github.sha }}\`" >> $GITHUB_STEP_SUMMARY
    echo "- Duration: $(date -d@$SECONDS -u +%M:%S)" >> $GITHUB_STEP_SUMMARY

Common Anti-Patterns (and the Fix)

Anti-pattern	Why it bites	Fix
Building on the prod server	Runs out of RAM / CPU, affects traffic, slow deploys.	Build on the runner, ship the artifact.
Long-lived cloud access keys in secrets	Rotation is manual; leaks are permanent.	OIDC.
`uses: actions/checkout@master` or `@main`	Breaks silently when upstream refactors.	Pin to a version tag (`@v4`) or a SHA for higher-security paths.
Shared secret for staging + prod	A compromise in either is a compromise in both.	Environment-scoped secrets.
`continue-on-error: true` to “ship anyway”	Hides real failures; gives teams a false sense of CI coverage.	Fix the test, or mark it as expected-to-fail with an issue number.
Deploy from any branch	Developers accidentally push prod at 2am.	Environment → Deployment branches → `main` only.
No concurrency group	Two deploys race; last-write-wins on prod.	Concurrency block.
Self-hosted runners on open internet	Runs untrusted PR code; attacker pivots to your network.	Private/ephemeral runners, and never run untrusted PRs on self-hosted.
Logging secrets	`echo $TOKEN` leaks to CI logs.	Never echo secrets; they’re masked but only when GitHub recognises them.

Real-World Shape: SSH-to-VPS Deploy

A complete SSH deploy workflow for a Node.js/Next.js app (loosely based on the one running palakorn.com):

name: Deploy

on:
  push:
    branches: [main]
    paths: ['apps/admin/**']
  workflow_dispatch:

concurrency:
  group: deploy-admin
  cancel-in-progress: false

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://example.com/admin
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
          cache-dependency-path: apps/admin/package-lock.json

      - name: Install, build
        working-directory: apps/admin
        run: |
          npm ci
          npm run db:generate
          npm run build

      - name: Package artifact
        run: tar -czf admin.tar.gz -C apps/admin .next package.json package-lock.json prisma public

      - name: Upload artifact
        uses: appleboy/scp-action@v0.1.7
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          port: ${{ secrets.DEPLOY_SSH_PORT }}
          key: ${{ secrets.DEPLOY_SSH_KEY }}
          source: "admin.tar.gz"
          target: /tmp

      - name: Activate
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          port: ${{ secrets.DEPLOY_SSH_PORT }}
          key: ${{ secrets.DEPLOY_SSH_KEY }}
          script: |
            set -euo pipefail
            TARGET=/var/www/apps/admin
            tar -xzf /tmp/admin.tar.gz -C "$TARGET"
            cd "$TARGET" && npm ci --omit=dev && npx prisma db push --accept-data-loss
            sudo systemctl restart app-admin
            rm /tmp/admin.tar.gz
            curl -sf http://127.0.0.1:3003/admin > /dev/null  # smoke

Notes:

Path filter apps/admin/** means unrelated changes don’t trigger this deploy.
Concurrency group is per-app (deploy-admin), so a web deploy and admin deploy can still run in parallel.
Environment URL makes the deployment page show the live URL with a one-click “view”.
Smoke check at the end fails the deploy if the service didn’t come back healthy, so you find out immediately, not from your users.

Closing Checklist

Before declaring your deploy workflow “done”:

GitHub Actions for Automated Deployment: A Practical Guide

The Mental Model

A Minimal Deploy Workflow

Build in CI, Ship Artifacts to Prod

Secrets: OIDC Beats Long-Lived Keys

Environments: Where Safety Lives

Caching: Most Workflows Are 60% Repeatable

Concurrency: Don’t Race Yourself

Matrix Builds: Parallel by Default

Deployment Strategies

Rollback — Wire It Before the First Deploy

Observability: You Will Regret Flying Blind

Common Anti-Patterns (and the Fix)

Real-World Shape: SSH-to-VPS Deploy

Closing Checklist

Further Reading

บทความที่เกี่ยวข้อง

การออกแบบ CI/CD Pipeline: หลักการสำหรับการปล่อยรุ่นที่เชื่อถือได้

Feature Flags และ Progressive Delivery บน Production