All articles
Performance Architecture Redis Backend Web Development

Caching Strategies Beyond 'Just Add Redis'

Palakorn Voramongkol
March 4, 2026 13 min read

“Cache-aside, write-through, write-behind, read-replicas, edge caches, HTTP ETags, and stale-while-revalidate — how to pick the right caching pattern and how to invalidate without tears.”

There Are Only Two Hard Problems in Computer Science…

…cache invalidation, naming things, and off-by-one errors. The joke lands because every backend engineer who has shipped a cache has eventually been paged at 3am because somebody is seeing yesterday’s pricing, a deleted comment, or a logged-out user’s dashboard.

The instinctive fix for a slow endpoint is “put Redis in front of Postgres.” Sometimes that’s the right answer. More often it’s the laziest answer — it trades a latency problem for a correctness problem, and in most systems correctness bugs are more expensive than slow pages.

This post is a map of the caching landscape for engineers designing a caching layer from scratch, or tearing down one that has grown into a liability. We’ll walk the locations a cache can live in, the write patterns that move data in and out, the invalidation strategies that keep it honest, and the observability you need to know whether it’s helping at all. No vendors, no benchmarks — just the shapes and trade-offs.

TL;DR

  • A cache can live in seven places — browser, CDN, edge worker, gateway, in-process, distributed store, read-replica — and most production systems layer several of them.
  • Five canonical write patterns (cache-aside, read-through, write-through, write-behind, write-around) trade latency, durability, and consistency in different mixes — pick by read/write ratio and staleness tolerance.
  • Combine event-based invalidation with a short TTL safety net, and give every cache key a single documented owner.
  • HTTP Cache-Control, ETag, and stale-while-revalidate are the cheapest cache you own; CDN cache tags extend the same idea to dynamic pages.
  • Protect hot keys with negative caching and singleflight to survive stampedes; route read-after-write paths back to the primary to survive replica lag.
  • Export hit rate, eviction rate, byte rate, and key cardinality, then alert on deviation from your own baseline — not invented “good” numbers.
  • Caches buy latency by spending consistency; the craft is keeping that trade visible in the code.

A Taxonomy of Cache Locations

Before choosing a pattern, decide where the cache lives. Each location is faster and narrower than the one below it. A production system usually has several layered together.

  • Browser cache — per-user, free, invisible to your infra. Controlled via HTTP headers. Fastest possible “hit” because the request never leaves the device.
  • CDN / edge cache — shared across users, located geographically close to them. Good for static assets and increasingly for dynamic HTML. Invalidation is the hard part.
  • Edge worker cache — programmable cache in the same runtime as edge code. Sits between CDN and origin. Lets you cache per-route with custom keys.
  • API gateway cache — coarse-grained response cache at the ingress. Convenient but inflexible: the gateway rarely understands your domain.
  • In-process / app memory cache — per-instance, nanosecond reads, zero network. Lost on restart. Inconsistent across instances.
  • Distributed cache (Redis, memcached) — shared across all app instances, single millisecond reads. Needs a network hop, needs eviction and persistence policies.
  • Read-replica — a database used as a cache. Eventually consistent. Transparent to the application if you route reads carefully.

A request travelling from browser to database can hit five of these before it reaches a disk. The job of a caching strategy is deciding which layer owns which piece of data, and what happens when that data changes.

The Five Write Patterns

Caches differ less in their read path (look up by key, return value) and more in how writes are handled. There are five canonical patterns.

1. Cache-aside (lazy loading)

The application reads from the cache first. On miss, it reads the source of truth, populates the cache, and returns. Writes go directly to the source of truth — the cache is either updated or invalidated afterwards.

async function getUser(id: string): Promise<User> {
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);

  const user = await db.users.findById(id);
  await redis.set(`user:${id}`, JSON.stringify(user), "EX", 300);
  return user;
}

async function updateUser(id: string, patch: Partial<User>): Promise<void> {
  await db.users.update(id, patch);
  await redis.del(`user:${id}`); // invalidate, let next read repopulate
}
async def get_user(user_id: str) -> User:
    cached = await redis.get(f"user:{user_id}")
    if cached:
        return User.parse_raw(cached)

    user = await db.users.find_by_id(user_id)
    await redis.set(f"user:{user_id}", user.json(), ex=300)
    return user


async def update_user(user_id: str, patch: dict) -> None:
    await db.users.update(user_id, patch)
    await redis.delete(f"user:{user_id}")  # invalidate, let next read repopulate

Cache-aside is the default for a reason: it’s simple, it tolerates cache outages (fall through to the database), and the code is obvious. The trade-off is a stampede on cold keys — the first N concurrent requests after a miss all query the database before any of them populate the cache. We’ll fix that later.

2. Read-through

The cache client itself fetches from the source on miss. The application only ever calls cache.get(key). The cache layer knows how to populate itself — usually via a loader function registered at startup.

Read-through looks cleaner than cache-aside but moves complexity into the cache library. You pay for it when the loader needs context that used to live in the caller: tenant ID, feature flags, auth scope. Most teams that start with read-through eventually reinvent cache-aside for “special” keys.

3. Write-through

Every write goes to the cache first, and the cache writes through to the source of truth synchronously. Reads always hit a populated cache.

Pros: reads are always fast, and the cache and source are never out of sync by more than one failed write. Cons: every write pays the latency of two systems, and the cache is now on the critical path of your write path — if it’s down, writes fail.

Use write-through when reads massively dominate writes and cache staleness is unacceptable — pricing tables, entitlement flags, feature configs.

4. Write-behind (write-back)

The application writes to the cache. The cache asynchronously flushes to the source of truth some time later — batched, coalesced, or scheduled.

This is the fastest write pattern and the most dangerous one. A cache crash before flush means lost writes. A misconfigured flush interval means read replicas stay stale for minutes. Use only when the source of truth is genuinely downstream (analytics, metrics, logs) and you can accept data loss on crash.

5. Write-around

Writes skip the cache and go directly to the source of truth. The cache is populated only on read (via cache-aside). Useful when write data is rarely read — audit logs, event records, cold archives. Writing them to cache just pollutes it with bytes nobody will request.

Which to pick

The quick mental model:

  • Read-heavy, tolerates brief staleness → cache-aside.
  • Read-heavy, cannot tolerate staleness → write-through.
  • Write-heavy to a downstream system → write-behind, accepting durability risk.
  • Write-once-read-rarely → write-around.
  • Wants clean code at the cost of flexibility → read-through.

Time-based vs Event-based Invalidation

Every cache eventually needs to expire or invalidate entries. Two strategies, each with its failure mode.

Time-based (TTL). Each entry expires after a fixed duration. Simple, bounded, and self-healing — a bug in invalidation logic only causes staleness for at most TTL seconds. The downside is obvious: you’re either too aggressive (high miss rate, hammered origin) or too lenient (users see stale data for minutes).

Event-based (explicit invalidation). On every write, the application explicitly invalidates the affected keys. Freshness is near-instant. The downside is the stale key trap: every write path must remember every key that might hold derived data, including denormalized shapes, paginated lists, search indexes, and aggregate counters. Miss one and the cache lies.

The pragmatic answer is both: event-based invalidation as the primary mechanism, short TTL as a safety net. If your invalidation code has a bug, the damage is capped at TTL. If TTL alone would be too long for users, the event path delivers freshness when it matters.

A rule I’ve found useful: every cache key should have an owner. A function, a module, or a service that is the sole writer. If three different code paths write to user:123, you will eventually have a stale-key incident, and bisecting which path wrote wrong is a miserable afternoon.

HTTP Caching Done Right

The HTTP cache is the cheapest cache you own — it’s the browser and every proxy between it and your origin, and you pay nothing for it. It’s also the one most commonly misconfigured.

The three headers that matter:

  • Cache-Control — the directive. Who may cache, for how long, and under what conditions.
  • ETag — a content fingerprint. The client echoes it in If-None-Match on the next request; if unchanged, the server returns 304 Not Modified with no body.
  • Last-Modified — a timestamp version of the same idea, paired with If-Modified-Since. Weaker than ETag but cheaper to compute.

A typical response header block for a cacheable dynamic resource looks like this:

Cache-Control: public, max-age=60, s-maxage=600, stale-while-revalidate=86400
ETag: "v3-8f2a91c"
Last-Modified: Tue, 03 Mar 2026 14:22:10 GMT
Vary: Accept-Language, Accept-Encoding

Read it line by line:

  • public — any cache may store this, including shared CDNs.
  • max-age=60 — browsers treat it fresh for 60 seconds.
  • s-maxage=600 — shared caches (CDN) treat it fresh for 10 minutes. Overrides max-age for shared caches only.
  • stale-while-revalidate=86400 — for the next 24 hours, after freshness expires, a cache may serve the stale version while revalidating in the background. The user sees old bytes instantly; the next user sees fresh bytes. This single directive removes most “cache miss = slow response” cliffs.
  • ETag — content fingerprint. Cheaper than resending the body.
  • Vary: Accept-Language — the cache must store separate variants for each language. Forget this on a localized page and English users will see Thai content.

A common trap: returning Cache-Control: no-cache when you meant no-store. no-cache allows caching but requires revalidation on every use (still useful — revalidation with ETag is free if the content is unchanged). no-store forbids caching entirely. Conflating them either leaks sensitive data or destroys performance.

CDN Caching of Dynamic Pages

The frontier of the last five years has been CDNs caching HTML, not just assets. Two mechanisms make this practical:

Incremental static regeneration. The framework (Next.js, Astro, Nuxt, SvelteKit) pre-renders pages and caches them at the edge. A stale page serves instantly; a background job regenerates it. The user-visible behaviour is stale-while-revalidate applied to full pages.

Cache tags and on-demand invalidation. Instead of relying on TTL, the framework tags each rendered page with the data it depends on — product:123, author:45, category:shoes. When that data changes, a write path calls revalidateTag("product:123") and every cached page carrying that tag is purged globally within seconds.

Here is the shape of on-demand revalidation from an API route:

// app/api/products/[id]/route.ts
import { revalidateTag } from "next/cache";

export async function PATCH(req: Request, { params }: { params: { id: string } }) {
  const patch = await req.json();
  await db.products.update(params.id, patch);

  // Every page that rendered with this tag is purged at the edge.
  revalidateTag(`product:${params.id}`);
  revalidateTag("product:list"); // the index page listing all products

  return Response.json({ ok: true });
}

The mental model is still write-through: the write is not done until the cache has been invalidated. Skip the invalidation call and every CDN in the world will happily serve the old page for an hour.

The failure mode to watch for: tag explosion. A page tagged with user:<every_user_who_commented> might carry hundreds of tags. If your CDN charges per tag or has a tag cardinality limit, design tags at the aggregate level (post:42:comments) rather than per-entity.

Local / In-process Caches

Redis is not free. Every call is a network hop, a serialization, a deserialization, and a potential TCP reconnect. For data that is hot, small, and shared across all instances identically — country lists, currency tables, feature flags that change once a day — a local cache is dramatically faster.

import { LRUCache } from "lru-cache";

const countryCache = new LRUCache<string, Country>({
  max: 500,
  ttl: 1000 * 60 * 10, // 10 minutes
});

export async function getCountry(code: string): Promise<Country> {
  const hit = countryCache.get(code);
  if (hit) return hit;

  const country = await db.countries.findByCode(code);
  countryCache.set(code, country);
  return country;
}
from cachetools import TTLCache

country_cache: TTLCache[str, Country] = TTLCache(maxsize=500, ttl=600)

async def get_country(code: str) -> Country:
    if code in country_cache:
        return country_cache[code]

    country = await db.countries.find_by_code(code)
    country_cache[code] = country
    return country

The trade-off is consistency: each instance has its own copy, so invalidation is hard. The usual answer is a short TTL plus a pub/sub channel: when data changes, the writer publishes an event, every instance subscribes and clears its local cache. Combining the two gives you the speed of local caching with bounded staleness.

Don’t put user-specific data in a local cache unless instances are sticky — otherwise hit rate collapses and RAM fills with duplicates.

Distributed Cache Patterns

Redis and memcached are the two common choices. Both are key-value stores with expiry; the differences matter at scale.

Pipelines. Sending N commands in one round trip instead of N. A 20-call cache lookup that takes 20ms serial becomes ~1ms pipelined. Most clients support it; most code does not use it. Audit your hottest endpoint — you will almost certainly find unnecessary serial round-trips.

Cluster and hash tags. Redis Cluster shards keys across nodes by hash. Commands that touch multiple keys (MGET, transactions) only work if all keys live on the same shard. The workaround is hash tags: wrap part of the key in {…}, and only that part hashes. user:{42}:profile and user:{42}:settings land on the same shard.

Memcached trade-offs. Simpler, faster on pure GET/SET, no persistence, no data structures beyond opaque blobs, no pub/sub. Choose it when you truly need a cache and nothing else. Redis has outgrown it in most places because the same binary doubles as a queue, a rate limiter, and a pub/sub bus — one less piece of infra.

Eviction policy. Under memory pressure, Redis evicts according to the configured policy. The dangerous default on some setups is noeviction — new writes fail when memory fills. Choose allkeys-lru for a pure cache, volatile-lru if you’re also storing non-cache data (session tokens, locks) with TTLs.

Read-replicas as a Cache

A read replica is arguably the cheapest “cache” you can add — the database already understands your schema, your types, your ACLs, and your query planner. It also comes with replication lag: the replica is always behind the primary by some amount of time, from microseconds to seconds depending on load.

Two hazards:

Read-after-write inconsistency. The user edits their profile, the write goes to the primary, the subsequent GET is routed to a replica that hasn’t caught up, and the user sees their old data. Fixes: route reads-within-N-seconds-of-a-write back to the primary (session stickiness), or have the client read from the primary for that session.

Silent replica drift. A replica can fall behind by minutes under heavy write load without generating alerts. Monitor replica_lag_seconds and alert on it. If your app’s tolerance is five seconds, alert at two.

Read-replicas are a terrible cache for anything that must be immediately consistent — balances, inventory counts, authorization checks — and a great cache for anything read far more often than written and tolerant of modest staleness (dashboards, listings, search results).

Negative Caching and Stampede Protection

Two patterns that show up once a cache is under real load.

Negative caching. When the source of truth returns “not found”, cache that too. Otherwise every request for a nonexistent key falls through to the database. Attackers love unbounded keyspaces — they’ll generate a million random IDs and turn your cache into a drill through to Postgres. Cache the miss with a short TTL.

Thundering herd / cache stampede. A popular key expires. A thousand requests arrive in the next millisecond. All thousand miss the cache, all thousand query the database, the database collapses.

The fix is request coalescing, also called singleflight: the first miss acquires a lock, fetches, populates the cache, releases the lock. The other 999 wait on the lock and read from the cache when it’s released.

Here is the Redis-backed shape of it in pseudocode:

function get_with_singleflight(key):
    value = cache.get(key)
    if value is not None:
        return value

    lock_key = "lock:" + key
    acquired = cache.SETNX(lock_key, instance_id, EX=5)

    if acquired:
        try:
            value = source_of_truth.get(key)
            cache.set(key, value, EX=300)
            return value
        finally:
            cache.delete(lock_key)
    else:
        # Another instance is fetching. Wait briefly, then read.
        for _ in range(10):
            sleep(50ms)
            value = cache.get(key)
            if value is not None:
                return value
        # Fallback: fetch ourselves rather than fail.
        return source_of_truth.get(key)

The details matter. SETNX with an expiry is atomic — if the holding instance dies, the lock releases after five seconds. The waiting instances poll the cache, not the lock — they want the value, not the lock. And the fallback on timeout is to fetch directly rather than to fail; a slow request is better than an error.

An in-process variant (per-instance singleflight) is even cheaper when the hot key is genuinely global — only one local fetch per instance instead of one per request. Most mature codebases end up with both layers.

Observing Your Cache

A cache you can’t see is a cache you can’t trust. Four signals to emit and watch:

  • Hit rate. Percentage of reads served from cache. A cache with a 20% hit rate is costing you money — it’s adding a network hop on 80% of requests to save nothing on the other 20%. Consider tightening scope or raising TTL.
  • Byte rate / memory pressure. How fast the cache fills. Tracks eviction risk. A sudden jump usually means a bug wrote giant values (a full result set instead of a single row, a serialized Map-of-everything).
  • Eviction rate. Keys evicted per second. Non-zero eviction in an LRU means you’re at capacity — hit rate will start degrading. Either add memory or tighten what you cache.
  • Key cardinality. Number of distinct keys. Unbounded growth means keys contain user-controlled input without a cap — a classic path to a cache-as-DoS-vector.

Alert on eviction rate spiking and hit rate dropping below baseline. Both are early warnings that something just changed in a way the cache wasn’t designed for — a new endpoint, a buggy key template, a traffic shift.

Resist the temptation to invent numbers about what these rates “should” be. They depend entirely on your workload. Baseline yours when the system is healthy, alert on deviation from that baseline.

A Decision Tree: This Endpoint Is Slow

When someone reports a slow endpoint and your reflex is to reach for Redis, walk this tree first:

  1. Is the bottleneck actually data retrieval, or is it computation, serialization, or a downstream API call? Profile first. Caching the wrong layer is a nice way to hide the real problem.
  2. Can the browser cache it? If the response is per-user but doesn’t change often, Cache-Control: private, max-age=N, stale-while-revalidate=M + an ETag gives you a free first-class cache.
  3. Can a CDN cache it? If the response is shared across users (or users grouped by language/region), CDN caching with tags is cheaper and faster than anything you run yourself.
  4. Is the data derivable at build/deploy time? Static rendering is the fastest cache — it’s bytes on disk at the edge.
  5. Is the data hot, small, and identical for all users? An in-process LRU with a short TTL beats Redis on latency.
  6. Is the data shared across instances and larger than fits locally? Now you reach for Redis with cache-aside.
  7. Is the source of truth itself the problem? A read-replica with careful routing is often simpler than another cache.
  8. Does write-consistency matter more than latency? Write-through into Redis, not cache-aside.

Each layer you skip is a layer you don’t have to invalidate.

Closing Checklist

Before you ship a cache, walk through this list:

  • Every cache key has a documented owner — one writer, many readers.
  • Every write path that touches cached data either invalidates or updates the cache.
  • TTL is set as a safety net, not as the primary freshness mechanism.
  • Cache-Control, ETag, and Vary are set on every HTTP response that can be cached, and deliberately omitted on ones that cannot.
  • Dynamic pages cached at the CDN are tagged, and every write path carries the tag in its invalidation call.
  • Singleflight / request coalescing protects every expensive key.
  • Negative results are cached with a short TTL.
  • Hit rate, eviction rate, and byte rate are exported as metrics with alerts on deviation.
  • Local in-process caches subscribe to a cross-instance invalidation channel, or use TTLs short enough that divergence doesn’t matter.
  • Read-replica routing has a fallback to the primary for read-after-write paths.
  • Someone on the team can draw the full cache layering — browser, CDN, edge, app, Redis, replica — on a whiteboard without hesitating.

A caching layer designed this way survives traffic spikes, data migrations, and 3am pages. One that skips these steps works fine in staging and then teaches you, expensively, which step you skipped.

Caches are not a performance feature. They’re a consistency liability you accept in exchange for latency. The craft is knowing exactly which consistency you’re giving up, for exactly which latency you’re buying, at exactly which layer — and keeping that trade visible in the code long after you’ve forgotten why you made it.

Further Reading

  • Designing Data-Intensive Applications — Martin Kleppmann (2017). Chapters on replication, consistency, and derived data are the foundation under every caching decision in this post.
  • MDN: HTTP caching — the canonical reference for Cache-Control, ETag, Vary, and stale-while-revalidate semantics.
  • RFC 9111: HTTP Caching — the spec itself; surprisingly readable, and the source of truth when MDN and your CDN disagree.
  • Redis documentation: eviction policies and clustering — the short page that prevents most production Redis incidents.
  • Caching at Netflix: The Hidden Microservice — talks and posts on EVCache and tiered caching at extreme scale, a useful contrast to the patterns above.

Comments powered by Giscus are not yet configured. Set PUBLIC_GISCUS_REPO_ID and PUBLIC_GISCUS_CATEGORY_ID in apps/web/.env to enable.

PV

Written by Palakorn Voramongkol

Software Engineer Specialist with 20+ years of experience. Writing about architecture, performance, and building production systems.

More about me

Continue Reading