Test มีอยู่ด้วยเหตุผลเดียว: เพื่อให้คุณสามารถแก้ไข production system ได้โดยไม่กลัว Test ใดที่ไม่ได้สร้างความมั่นใจนั้นขึ้นมา ก็คือภาษีที่เก็บจากทุก ๆ การเปลี่ยนแปลงในอนาคต กุญแจอยู่ที่การรู้ว่า pyramid แต่ละชั้นจับ bug ประเภทใด และซื่อสัตย์กับตัวเองว่าแต่ละชั้นมีต้นทุนเท่าไหร่จริง ๆ

TL;DR

Test pyramid ยังใช้ได้อยู่ แต่ unit test มักถูกสั่งจ่ายเกินขนาด ส่วน integration test กลับลงทุนน้อยเกินไป

Container ทำให้ database ของจริงราคาถูก เลิก mock Postgres และ Redis เถอะ คุณกำลัง test ตัว mock เป็นหลัก

Contract test (Pact) คือเครื่องมือที่ถูกต้องสำหรับสอง service ที่ release คนละ cadence ส่วน e2e ไม่ใช่

ทำ e2e suite ให้เล็ก deterministic และ seed ผ่าน API อย่า login ผ่าน UI ในทุก test

Property-based test จับ bug ที่ตัวอย่างของคุณคิดไม่ถึง: serialization, math, round-trip

มีนโยบายเรื่อง flaky test เป็นลายลักษณ์อักษร: quarantine ตั้งแต่ flake ครั้งแรก หา root cause ภายในหนึ่งสัปดาห์ ไม่งั้นก็ลบ

Coverage เป็น diagnostic ที่มีประโยชน์ แต่เป็นเป้าหมายที่แย่ ติดตาม flake rate และ mutation score แทน

ทำไม Pyramid แบบดั้งเดิมยังสำคัญ

Mike Cohn วาด test pyramid ขึ้นมาราว ๆ ปี 2009: ฐานกว้างของ unit test, แถบ service/integration test ที่แคบลง, และยอด UI test บาง ๆ สิบเจ็ดปีผ่านไปมันก็ยังคงเป็น mental model หนึ่งหน้าที่ดีที่สุดสำหรับการตัดสินใจว่าจะทุ่มแรง test ไปที่ไหน และก็ยังคงเป็นจุดที่ทีมส่วนใหญ่ทำพลาด

Pyramid นี้พูดถึง cost vs. confidence Unit test เขียนถูก รันถูก แต่ให้ confidence ที่อ่อนว่า system โดยรวมทำงานได้ End-to-end test ให้ confidence ที่หนักแน่นเรื่องพฤติกรรมที่ user เห็น แต่แต่ละตัวช้า เปราะ และแพงในการ maintain ตรงกลางคือชั้นที่ defect จริง ๆ ส่วนใหญ่ซ่อนอยู่

จุดที่ pyramid ทำให้ทีมเข้าใจผิดในปี 2026:

มันถูกวาดก่อนที่ container จะทำให้ dependency ของจริงราคาถูก คำว่า “Integration” เคยแปลว่า “ช้าและยาก” แต่ตอนนี้ไม่ใช่แล้ว
มันมีอยู่ก่อนยุค microservice จึงไม่มีอะไรจะพูดเรื่อง contract test ระหว่าง service
มันมอง “unit” เป็นก้อนเดียว ทั้งที่จริงมีอยู่สองสายพันธุ์: pure-function unit test (เร็ว มีคุณค่า) และ heavy-mock unit test (เขียนแพง ส่วนใหญ่ test ตัว mock)
มันสื่อเป็นนัยว่าคุณควรเขียน unit test เยอะ ๆ สิ่งที่คุณต้องการจริง ๆ คือ unit test พอเหมาะ — ตัวที่ใช่ — และมีชั้นกลางที่แข็งแรงซึ่ง pyramid ดั้งเดิมให้น้ำหนักต่ำเกินไป

ดังนั้น คงรูปทรงไว้ แต่ปรับปรุงเนื้อหา บทความนี้พูดถึงสิ่งที่แต่ละชั้นจับได้จริง ๆ จะลงทุนตรงไหนตามประเภทของ service และจะรันทุกอย่างบน CI โดยที่ทีมไม่เกลียดวันจันทร์ได้อย่างไร

ประเภทของ Test ในรูปแบบ Spectrum

ลองมองว่า label เหล่านี้เป็น spectrum ไม่ใช่กลุ่มแยก จากเล็กและเร็วที่สุดไปยังใหญ่และช้าที่สุด:

ประเภท	จับอะไรได้	ต้นทุน	คุ้มเมื่อใด
Unit (pure)	bug ทาง logic ใน function, edge case, math	มิลลิวินาที เขียนง่าย	เสมอ — สำหรับ pure logic ที่ไม่ trivial
Unit (with mocks)	รูปร่างของ interaction ระหว่าง component	รันถูก แต่ maintain แพง	เท่าที่จำเป็น — เฉพาะเมื่อ collaborator จริงไม่สามารถใช้ได้จริง ๆ
Integration	bug ที่ขอบของ DB, cache, queue	วินาที ต้องใช้ container หรือ test DB	service ใดก็ตามที่คุยกับ stateful infra (ก็เกือบทั้งหมด)
Contract	drift ระหว่าง producer และ consumer API	ตั้งครั้งเดียวก็ราคาถูก ต้องมี broker	สอง service ขึ้นไปที่ ship คนละ cadence
E2E	flow ที่ user เห็น, การ orchestrate ข้าม service	หลายสิบวินาทีต่อตัว flake ง่ายเมื่อโหลดสูง	ชุด “golden path” ขนาดเล็กที่คัดมาแล้ว
Visual regression	การเปลี่ยน CSS / layout ที่ไม่ได้ตั้งใจ	snapshot เปราะกับ anti-aliasing	UI component library, marketing page
Load / performance	latency regression, connection leak	นาทีถึงชั่วโมง ต้องมี target env	ก่อนแต่ละ release ของ service ที่ไวต่อ latency
Chaos / fault injection	ความถูกต้องของ recovery, retry, timeout	ต้องมี staging มี operational cost	service ที่ downtime แปลงเป็นเงินจริง

คำถามที่มีประโยชน์ไม่ใช่ “เรามี unit test พอหรือยัง?” แต่คือ “สำหรับ bug ประเภทที่เรา ship ออกไปจริง ๆ ชั้นไหนน่าจะจับมันได้?” เก็บบันทึกทุก production incident ตลอดหนึ่งไตรมาสแล้วนับว่าชั้นไหนควรจับได้ คำตอบมักจะเป็น: integration test ที่คุณไม่ได้เขียน หรือ e2e ที่คุณเคยมีแต่ลบไปแล้วเพราะมัน flake

จะลงทุนที่ไหนตามประเภทของ Service

Service แต่ละแบบมีโปรไฟล์ bug ต่างกัน กฎตายตัว “coverage 80% ทุกที่” คือวิธีที่ทำให้คุณลงเอยด้วยการเขียน test ที่ไม่มีใครอ่าน

Stateless API / business-logic service

เน้น unit test สำหรับ domain logic — validator, calculator, state machine
เน้น integration test กับ database จริงใน container
e2e test เล็กน้อยที่ยิงเข้า HTTP surface ผ่าน framework จริง (request เข้า, JSON ออก)
Contract test ถ้ามี service อื่นเรียกมัน

Service ที่หนัก data (ETL, reporting, analytics)

Unit test น้อย — ความซับซ้อนส่วนใหญ่อยู่ใน SQL หรือรูปร่างของ data ไม่ใช่ใน function
เน้นหนักมาก ในการทำ integration test ด้วย fixture dataset ที่เป็นตัวแทน
Property-based test สำหรับ logic การ transform: “ไม่ว่าอะไรเข้ามา ผลรวมของ amount เข้า เท่ากับผลรวมของ amount ออก”
Golden-file test ต่อหนึ่ง report ที่ diff ผลทั้งหมดเทียบกับไฟล์ expected ที่ commit ไว้

UI application

Unit test สำหรับ hook, reducer และ utility ที่ไม่ trivial
Component test (render จริง ไม่ mock) สำหรับอะไรที่มี logic แตกแขนง
e2e suite บาง ๆ (Playwright) สำหรับ flow ห้าหรือหกตัวที่คุณ ship แบบพังไม่ได้ — login, checkout, search ฯลฯ
Visual regression เฉพาะ surface ที่นิ่งแล้ว เพราะมันเป็นภาษีในการ maintain สำหรับอะไรก็ตามที่ยังอยู่ในการ iterate ดีไซน์

Legacy monolith ที่คุณรับมา

อย่า ตั้งเป้า coverage แต่ตั้งเป้า characterization test รอบทุกการแก้ไข
ใช้ pattern “test-before-you-change” ของ Michael Feathers: ก่อนแก้ calculateInvoice ให้จับพฤติกรรมปัจจุบันของมันใน test ก่อน แล้วค่อยแก้ code แล้วค่อย update test
เพิ่ม integration test ที่รอยต่อสถาปัตยกรรมที่คุณระบุได้ ปล่อย 40,000 บรรทัดที่ไม่มีใครแตะในไตรมาสนี้ไป

Integration Test ที่ไม่โกหก

การเปลี่ยนแปลงที่มี leverage สูงสุดที่ทีมส่วนใหญ่ทำได้คือ: เลิก mock database และ Redis ของคุณ Mock มัน drift ของจริงไม่ drift Testcontainers (หรือเทียบเท่า) ทำให้สิ่งนี้แทบจะฟรี

Integration test ที่ใช้ Postgres จริงจับ bug ของ SQL, bug ของ migration, bug ของ transaction, bug ของ serialization และ bug ของ timezone ส่วน unit test ที่ mock database จับได้ว่าคุณเขียน mock ถูกต้อง คนละเรื่องกัน

ตัวอย่าง TypeScript + pytest

import { PostgreSqlContainer, StartedPostgreSqlContainer } from "@testcontainers/postgresql";
import { Pool } from "pg";
import { UserRepository } from "../src/users/repository";

describe("UserRepository (integration)", () => {
  let container: StartedPostgreSqlContainer;
  let pool: Pool;
  let repo: UserRepository;

  beforeAll(async () => {
    container = await new PostgreSqlContainer("postgres:16-alpine").start();
    pool = new Pool({ connectionString: container.getConnectionUri() });
    await pool.query(`
      CREATE TABLE users (
        id UUID PRIMARY KEY,
        email TEXT UNIQUE NOT NULL,
        created_at TIMESTAMPTZ NOT NULL DEFAULT now()
      );
    `);
    repo = new UserRepository(pool);
  }, 60_000);

  afterAll(async () => {
    await pool.end();
    await container.stop();
  });

  it("enforces unique emails", async () => {
    await repo.create({ email: "a@example.com" });
    await expect(
      repo.create({ email: "a@example.com" })
    ).rejects.toThrow(/duplicate key/);
  });

  it("round-trips timestamps in UTC", async () => {
    const user = await repo.create({ email: "b@example.com" });
    const fetched = await repo.findById(user.id);
    expect(fetched?.createdAt.toISOString()).toBe(user.createdAt.toISOString());
  });
});

import pytest
from testcontainers.postgres import PostgresContainer
from sqlalchemy import create_engine, text
from app.users.repository import UserRepository


@pytest.fixture(scope="module")
def pg():
    with PostgresContainer("postgres:16-alpine") as container:
        engine = create_engine(container.get_connection_url())
        with engine.begin() as conn:
            conn.execute(text("""
                CREATE TABLE users (
                    id UUID PRIMARY KEY,
                    email TEXT UNIQUE NOT NULL,
                    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
                );
            """))
        yield engine


@pytest.fixture
def repo(pg):
    return UserRepository(pg)


def test_unique_email(repo):
    repo.create(email="a@example.com")
    with pytest.raises(Exception, match="duplicate key"):
        repo.create(email="a@example.com")


def test_timestamps_roundtrip_utc(repo):
    user = repo.create(email="b@example.com")
    fetched = repo.find_by_id(user.id)
    assert fetched.created_at == user.created_at

กฎปฏิบัติสองข้อสำหรับ integration test ที่จะเร็วอยู่เสมอ:

เริ่ม container ครั้งเดียวต่อหนึ่งไฟล์ test ไม่ใช่ครั้งเดียวต่อหนึ่ง test การ spin up Postgres ใช้เวลาไม่กี่วินาที ทำต่อ test คือวิธีที่ทำให้ไฟล์ 40 test ใช้เวลาห้านาที
Truncate อย่า recreate ระหว่าง test ให้ TRUNCATE table หรือ rollback transaction ที่ครอบอยู่ การ drop และสร้าง schema ใหม่ฆ่า throughput ของคุณ

สิ่งที่ Integration Test ไม่ควรทำ

มันไม่ควรไปยัง third-party API ภายนอก Stripe, SendGrid, OpenAI หรืออะไรก็ตาม — สิ่งเหล่านั้นควรอยู่หลัง interface และ integration test ของคุณก็ stub interface นั้นด้วย fake ที่ implement contract เดียวกัน ซึ่งนำไปสู่หัวข้อถัดไป

Contract Test สำหรับ Microservices

ถ้าคุณมีสอง service หนึ่งเรียกอีกหนึ่ง และ ship คนละ cadence — คุณมีปัญหา contract Integration test แก้ไม่ได้เพราะ consumer และ producer ถูก test แยกกัน E2E test จับได้แต่แพงและรันช้า

Consumer-driven contract testing (Pact คือเครื่องมือยอดนิยม) ให้ทางสายกลางที่ราคาถูก Consumer เขียนลงไปว่า “นี่คือ request ที่ฉันส่ง และนี่คือ response ที่ฉันคาดหวัง” Contract นั้นถูก publish ไปยัง broker จากนั้น producer ใน pipeline ของตัวเองรัน contract กับ implementation จริง ฝั่งใดที่ drift จะพัง build

// Consumer side — orders-service expects a shape from users-service
import { PactV3, MatchersV3 } from "@pact-foundation/pact";
import { UserClient } from "../src/clients/user-client";

const provider = new PactV3({
  consumer: "orders-service",
  provider: "users-service",
});

describe("UserClient", () => {
  it("fetches a user by id", async () => {
    provider
      .given("a user with id 42 exists")
      .uponReceiving("a request for user 42")
      .withRequest({ method: "GET", path: "/users/42" })
      .willRespondWith({
        status: 200,
        body: {
          id: MatchersV3.integer(42),
          email: MatchersV3.email("jane@example.com"),
          tier: MatchersV3.regex(/^(free|pro|enterprise)$/, "pro"),
        },
      });

    await provider.executeTest(async (mockServer) => {
      const client = new UserClient(mockServer.url);
      const user = await client.getById(42);
      expect(user.tier).toBe("pro");
    });
  });
});

# Producer side — users-service verifies the contract
from pact import Verifier

def test_honours_orders_service_contract():
    verifier = Verifier(provider="users-service", provider_base_url="http://localhost:8080")

    success, _ = verifier.verify_with_broker(
        broker_url="https://pact.internal.example.com",
        publish_version="1.42.0",
        provider_states_setup_url="http://localhost:8080/_pact/provider-states",
    )
    assert success == 0

Contract test แทนที่ integration test ประเภทที่แย่ที่สุด — ตัวที่คุยข้าม service ที่ต้องการให้ทั้งสอง service รัน, มี network จริง และ orchestration ที่ระวัง มันไม่ได้แทนที่ integration test ภายในของคุณ มันคือเครื่องมือคนละชิ้นสำหรับปัญหาคนละแบบ: ทำให้สอง service ที่ deploy แยกจากกันยังคงซื่อสัตย์ต่อกัน

จุดที่ contract test สัญญามากเกินไป: มันครอบคลุมเฉพาะ interaction ที่ consumer ใช้จริงเท่านั้น ถ้า producer เพิ่ม field ใหม่ consumer ไม่รู้ ถ้า producer ลบ field ที่ไม่มีใคร test มันจะพังเงียบ ๆ บน production Contract คือพื้น ไม่ใช่เพดาน

E2E: คุม List ให้เล็ก คุมให้มันมีชีวิต

E2E suite คือจุดที่ทีมเสียเวลามากที่สุดและเสียเครดิตมากที่สุด ทุกบริษัทที่ผมเคยทำงานด้วย ณ จุดใดจุดหนึ่ง เคยมี e2e suite ที่ถูกปิดใน CI “ชั่วคราว” แล้วก็กลายเป็นถาวร

กฎสองข้อที่แก้ปัญหา e2e ได้เกือบทุกกรณี:

หนึ่ง golden path ต่อหนึ่ง feature ไม่ใช่เมทริกซ์ของ variation Variation ต่าง ๆ ควรอยู่ใน unit และ integration test E2E มีอยู่เพื่อพิสูจน์ว่าการต่อสายไฟทั้งระบบยึดอยู่
Fixture ที่ deterministic, clock ที่ deterministic, ID ที่ deterministic Test ที่ seed user แบบสุ่ม เดิน path แบบสุ่ม และ assert บน timestamp แบบสุ่ม คือ test ที่จะ flake

import { test, expect } from "@playwright/test";

test("customer can place an order", async ({ page }) => {
  // Seed through an API, not the UI. The UI part is what we're testing.
  const { orderSeedToken } = await fetch("http://localhost:3000/test/seed", {
    method: "POST",
    body: JSON.stringify({ scenario: "logged-in-customer-with-cart" }),
  }).then((r) => r.json());

  await page.goto(`/?seed=${orderSeedToken}`);
  await page.getByRole("button", { name: "Checkout" }).click();
  await page.getByLabel("Card number").fill("4242 4242 4242 4242");
  await page.getByLabel("Expiry").fill("12/30");
  await page.getByLabel("CVC").fill("123");
  await page.getByRole("button", { name: "Pay" }).click();

  await expect(page.getByRole("heading", { name: "Order confirmed" })).toBeVisible();
  await expect(page).toHaveURL(/\/orders\/[a-f0-9-]+$/);
});

from playwright.sync_api import Page, expect
import requests

def test_customer_can_place_an_order(page: Page):
    seed = requests.post(
        "http://localhost:3000/test/seed",
        json={"scenario": "logged-in-customer-with-cart"},
    ).json()

    page.goto(f"/?seed={seed['orderSeedToken']}")
    page.get_by_role("button", name="Checkout").click()
    page.get_by_label("Card number").fill("4242 4242 4242 4242")
    page.get_by_label("Expiry").fill("12/30")
    page.get_by_label("CVC").fill("123")
    page.get_by_role("button", name="Pay").click()

    expect(page.get_by_role("heading", name="Order confirmed")).to_be_visible()
    expect(page).to_have_url(r"/orders/[a-f0-9-]+$")

ข้อสังเกตที่ทำให้ test นี้อยู่รอดได้หกเดือน:

Seed ผ่าน API ไม่ใช่ UI การ login ผ่านหน้า login ในทุก test คือแหล่งความช้าและ flake ที่ใหญ่ที่สุดของ e2e
Selector แบบ role-based ไม่ใช่ CSS getByRole("button", { name: "Checkout" }) รอดจากการเปลี่ยนชื่อ class แต่ .btn-primary-2 ไม่รอด
Assert บน URL หรือ heading ไม่ใช่ toast message Toast หายไป URL ไม่หาย
Payment ถูก stub ที่ขอบของ infrastructure — test environment ของคุณชี้ไปที่ Stripe ปลอม ไม่ใช่ของจริง การที่มันรับ 4242... เป็นเลขบัตรคือ convention ของ test-double ไม่ใช่การตัดเงินจริง

Property-Based Testing

Property-based testing สร้าง input หลายร้อยตัวจากข้อกำหนด แล้วตรวจว่า invariant ใช้ได้กับทุก input มันหา bug ที่ example-based test หาไม่เจอ เพราะมนุษย์เลือกตัวอย่างจากการแจกแจงที่เอนเอียงไปทาง “เคสที่ฉันคิดได้แล้ว” อย่างหนัก

การใช้งานหลักคือสำหรับอะไรก็ตามที่อ้างคุณสมบัติทาง algebraic: inverse, idempotence, commutativity, round-trip Code ที่ทำ serialization, sorting, merging, deduping — ทั้งหมดนี้คือจุดหวานของ property test

import fc from "fast-check";
import { serialize, deserialize } from "../src/codec";

describe("codec", () => {
  it("round-trips any valid order", () => {
    fc.assert(
      fc.property(
        fc.record({
          id: fc.uuid(),
          items: fc.array(
            fc.record({
              sku: fc.string({ minLength: 1 }),
              quantity: fc.integer({ min: 1, max: 1000 }),
              price: fc.integer({ min: 0, max: 10_000_000 }),
            }),
            { minLength: 1, maxLength: 50 },
          ),
          placedAt: fc.date({ min: new Date("2000-01-01"), max: new Date("2100-01-01") }),
        }),
        (order) => {
          const decoded = deserialize(serialize(order));
          expect(decoded).toEqual(order);
        },
      ),
      { numRuns: 500 },
    );
  });

  it("serialize is deterministic", () => {
    fc.assert(
      fc.property(fc.anything(), (x) => serialize(x) === serialize(x)),
    );
  });
});

from hypothesis import given, strategies as st
from app.codec import serialize, deserialize


@given(
    st.fixed_dictionaries({
        "id": st.uuids().map(str),
        "items": st.lists(
            st.fixed_dictionaries({
                "sku": st.text(min_size=1),
                "quantity": st.integers(min_value=1, max_value=1000),
                "price": st.integers(min_value=0, max_value=10_000_000),
            }),
            min_size=1, max_size=50,
        ),
        "placed_at": st.datetimes(),
    })
)
def test_roundtrips_any_valid_order(order):
    assert deserialize(serialize(order)) == order

ครั้งแรกที่คุณรัน property test กับ serializer มันจะหาเจอเรื่องแบบ: “อ้อ surrogate pair ใน string ทำให้ UTF-8 handling พัง” หรือ “amount ที่เท่ากับ 2^31 พอดี overflow” หรือ “string ว่างเป็น SKU ที่ valid และเราคิดไม่ถึง” นี่คือ bug ที่ ship ออกไปแล้วโผล่ใน incident retro สามเดือนต่อมา

เมื่อไม่ควรใช้: ที่ใดก็ตามที่ property ที่คุณพยายามจะกล่าวมีความซับซ้อนพอ ๆ กับ implementation ถ้าการกล่าว invariant ต้องการเขียน code ใหม่ คุณไม่ได้ test คุณกำลังทำซ้ำ

การจัดการ Flaky Test

Test suite ที่อยู่ยาวทุกตัวสะสม flake คำถามไม่ใช่ว่าคุณจะมี flaky test หรือไม่ — คุณมีแน่ — แต่คือคุณจะทำอย่างไรเมื่อมันโผล่มา คำตอบที่แย่ที่สุดและพบบ่อยที่สุดคือ “merge ไปก็คงไม่เป็นไร” สิ่งนั้นฝึกให้ทั้งทีมเพิกเฉยต่อ CI ซึ่งคือจุดเริ่มต้นของจุดจบ

นโยบายที่ใช้ได้:

Quarantine ตั้งแต่ flake ครั้งแรก Test ที่ fail เป็นบางครั้งจะถูกย้ายไป quarantine suite ที่รันแต่ไม่ block การ merge สิ่งนี้ทำให้ main suite เขียวและสัญญาณซื่อสัตย์
Auto-retry แต่พอประมาณ และห้ามเงียบ อย่างมาก retry อัตโนมัติหนึ่งครั้ง และข้อเท็จจริงของการ retry ต้องถูก log และ track ถ้า test ผ่านได้ก็ต่อเมื่อ retry สาม build ติด มันต้องไป quarantine
หา root cause ภายในหนึ่งสัปดาห์ ไม่งั้นก็ลบ Test ใน quarantine มีวันหมดอายุ ไม่ก็มีคนแก้ race condition, สมมุติฐานเรื่องเวลา หรือ shared-state leak ที่ซ่อนอยู่ ไม่งั้นก็ลบ test ทิ้ง Test ที่คุณไม่เชื่อใจแย่กว่าไม่มี test เลย เพราะมันกินเวลา CI และสอนให้คนเพิกเฉยต่อ failure
Track flake rate เป็น metric เปอร์เซ็นต์ของการรันที่มี test fail ในความพยายามแรกแต่ผ่านตอน retry ถ้ามันเพิ่มขึ้น หยุดงาน feature แล้วแก้

Template สำหรับการหา root cause ที่จับ flake ส่วนใหญ่ได้:

Time: test สมมุติว่า Date.now() คืนค่าที่เจาะจง หรือว่า timestamp สองตัวเท่ากันข้ามการ write ลง DB หรือเปล่า?
Order: test นี้ผ่านเมื่อรันเดี่ยว แต่ fail เมื่อรันหลัง test อื่นหรือเปล่า? Shared state — module singleton, แถว DB ที่ไม่ถูก clean, connection ที่ถูก cache
Concurrency: async operation ที่ไม่ถูก await, promise ที่ถูกกลืน, race ระหว่าง seed กับ read
External dependency: DNS, clock skew, API จริงที่คุณลืม stub, rate limit
Resource pressure: test ผ่านบน laptop ของคุณ fail บน CI runner ที่กำลังต่ำ มักเป็น timeout ที่ตั้งแน่นเกินไป

Flake เก้าในสิบเป็นหนึ่งในห้าข้อนี้

อะไรควรอยู่ที่ไหนใน CI

ไม่ใช่ทุก test ที่ควรรันในทุก commit Pipeline ที่ดีจัดชั้น suite ตามต้นทุนและมูลค่าของ feedback

# .github/workflows/ci.yml — excerpt
name: CI

on:
  pull_request:
  push:
    branches: [main]
  schedule:
    - cron: "0 3 * * *"   # nightly

jobs:
  fast:
    # Runs on every push. Target: under 3 minutes.
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "22", cache: "pnpm" }
      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
      - run: pnpm typecheck
      - run: pnpm test:unit

  integration:
    # Runs on every PR. Target: under 10 minutes.
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env: { POSTGRES_PASSWORD: test }
        ports: ["5432:5432"]
        options: >-
          --health-cmd "pg_isready -U postgres"
          --health-interval 5s --health-retries 10
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "22", cache: "pnpm" }
      - run: pnpm install --frozen-lockfile
      - run: pnpm test:integration
      - run: pnpm test:contract

  e2e:
    # Runs on PRs touching app code, and on main. Target: under 15 minutes.
    if: contains(github.event.pull_request.labels.*.name, 'app') || github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps chromium
      - run: pnpm test:e2e

  nightly:
    # Heavy suites: full e2e matrix, load tests, property tests with extra runs.
    if: github.event_name == 'schedule'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pnpm install --frozen-lockfile
      - run: pnpm test:e2e --project=all-browsers
      - run: pnpm test:property --runs=10000
      - run: pnpm test:load

หลักการ: ความเร็วของ feedback แปรผันตรงกับความถี่ที่คุณอยากให้ developer เห็นมัน Lint และ unit test รันในทุก push เพราะ failure สิบวินาทีสอนอะไรคุณได้ Load test รันรายคืนเพราะ failure ยี่สิบนาทีตอน commit สอนให้คุณเลิก commit

อีกหนึ่งเทคนิคที่ใช้น้อยเกินไป: affected-project filtering ถ้า monorepo ของคุณมีสิบสอง service และ PR แตะแค่หนึ่ง คุณไม่ควรรัน integration suite ของอีกสิบเอ็ดตัว เครื่องมืออย่าง Nx, Turborepo และ Bazel แก้ปัญหานี้ทั้งหมด แต่ git diff หยาบ ๆ ใน shell script ก็แก้ได้ 80% ฟรี ๆ

ตัวเลข Coverage ที่สำคัญและที่ไม่สำคัญ

Coverage เป็นเป้าหมายที่แย่และเป็น diagnostic ที่มีประโยชน์

ในฐานะเป้าหมาย มันสร้าง test ที่เขียนเพื่อขยับเปอร์เซ็นต์ พวกมันรัน code โดยไม่ assert พฤติกรรม Reviewer ปั๊มอนุมัติเพราะเกณฑ์คือ coverage ไม่ใช่คุณภาพ ทีมจบลงด้วย coverage สูงและ confidence ต่ำ — combination ที่แย่ที่สุด

ในฐานะ diagnostic coverage มีประโยชน์ในสองทางที่เจาะจง:

ไฟล์ที่ coverage เป็นศูนย์ ไฟล์ใดที่ coverage 0% คือไม่ได้ test (ไปแก้) หรือเป็น code ตาย (ลบทิ้ง) ผลลัพธ์ทั้งสองดีทั้งคู่
Coverage drop บน PR ไม่ใช่ “coverage ต่ำกว่า 80%” แต่คือ “PR นี้ลด coverage ลง 3%” นั่นคือสัญญาณที่คุ้มกับ comment ใน review

ตัวเลขที่ผมแนะนำให้ใส่ใจสำหรับ service ส่วนใหญ่:

Branch coverage บน domain / business logic: สูง ไม่ใช่เพราะตัวเลขสำคัญ แต่เพราะถ้าคุณ cover branch ใน pure logic ได้ยาก แสดงว่ารูปทรงของ logic นั้นน่าจะผิด
Line coverage โดยรวม: เท่าไหร่ก็ตาม อย่าตั้งเป้า
Mutation-testing score บน module ที่สำคัญ (Stryker, mutmut, PIT): นี่คือตัวเลขที่บอกคุณจริง ๆ ว่า test ของคุณ assert บนพฤติกรรม มันรันแพง จึงเก็บไว้สำหรับ module ที่สำคัญ — pricing, auth, payment
Flake rate: ดูแนวโน้มของมัน Flake rate ที่เพิ่มขึ้นคือสัญญาณที่เร็วที่สุดว่า suite ของคุณกำลังเสื่อม
P50 และ P95 ของระยะเวลา CI: ถ้า P95 ค่อย ๆ คืบขึ้น developer จะเริ่มทำงานอ้อม CI แทนที่จะทำงานกับมัน

สรุปเช็คลิสต์

เมื่อคุณ review test suite — ของคุณเองหรือที่คุณเพิ่งรับมา — เดินตาม list นี้ “ไม่” ใด ๆ คือโอกาส

ชั้น unit test ครอบคลุม business logic จริง ๆ ไม่ใช่แค่ glue code รอบ ๆ ใช่หรือไม่?
Integration test ใช้ infrastructure จริงผ่าน container หรือใช้ double ที่ mock ไว้แล้ว drift?
สำหรับการเรียกข้าม service ทุกครั้ง มี contract test หรือ e2e test ที่จะจับ breaking change ได้หรือเปล่า?
E2E suite เล็ก deterministic และถูก maintain อยู่จริง — หรือเป็นสุสานของ .skip?
มี property-based test อย่างน้อยหนึ่งตัวบน module ใด ๆ ที่มีคุณสมบัติ algebraic หรือไม่?
เมื่อ test flake มีนโยบายเป็นลายลักษณ์อักษรว่าจะทำอะไรต่อ — หรือทุกคนแค่ retry?
Pipeline รัน suite ที่ถูกต้องในเฟสที่ถูกต้อง — feedback เร็วตอน commit, check หนักตอนกลางคืนหรือไม่?
มีคนดู coverage drop และ flake rate ในฐานะ metric ไม่ใช่ตัวเลขเอาหน้าหรือเปล่า?
Engineer ใหม่จะรัน suite เต็มในเครื่องตัวเองได้ภายในสิบห้านาทีหรือไม่? ถ้าไม่ พวกเขาจะไม่รัน และคุณก็จะไม่รันด้วย

อ่านเพิ่มเติม

Test มีอยู่เพื่อให้คุณแก้ system ได้โดยไม่กลัว Test ใดที่ไม่ได้รับใช้เป้าหมายนั้น — ทุก snapshot ของ HTML ที่ trivial, ทุก mock ของ mock, ทุก e2e ที่ test หน้า login เป็นครั้งที่ร้อย — คือภาษีของทุก ๆ การเปลี่ยนแปลงในอนาคต Test suite ที่แข็งแรงที่สุดที่ผมเคยทำงานด้วยไม่ใช่ตัวที่ใหญ่ที่สุด แต่เป็นตัวที่ test แต่ละตัวอยู่ตรงนั้นเพราะมีคนตัดสินใจว่ามันสมควรอยู่ และตัวที่หยุดสมควรก็ถูกลบไป จงเขียน test เหมือนคุณจะเป็นคน maintain มันเป็นเวลาสองปี เพราะคุณจะเป็น

Working Effectively with Legacy Code — Michael Feathers (2004) เอกสารอ้างอิงหลักสำหรับ characterization test และ seam ใน code ที่ไม่ถูก test
Growing Object-Oriented Software, Guided by Tests — Steve Freeman & Nat Pryce (2009) ที่ที่สไตล์การ test แบบ “outside-in” ถูกบัญญัติขึ้น
xUnit Test Patterns — Gerard Meszaros (2007) คลังศัพท์ที่ครอบคลุมสำหรับ smell และ pattern ที่โผล่มาใน test suite จริง
Accelerate — Forsgren, Humble & Kim (2018) หลักฐานเชิงประจักษ์ว่าทำไม test suite ที่เร็วและน่าเชื่อถือสัมพันธ์กับทีมที่ performance สูง
บทความ TestPyramid และ ContractTest ของ Martin Fowler — สั้น แน่น ยังทันสมัย

กลยุทธ์การ Test สำหรับ Production Systems