The Monolith Trap

Every B2B platform I’ve ever inherited started as a monolith. That’s not a criticism — a well-structured monolith ships faster in year one than any distributed system ever will. The problem comes in year two, when a three-person payments team needs to deploy six times a day while your five-person reporting team is still mid-sprint on a quarterly release cycle. Suddenly you’re coordinating deployments like a military operation, and every hotfix turns into a board-level conversation.

The solution isn’t to rewrite everything. It’s to extract boundaries where they hurt most.

Identifying Service Boundaries

The Conway’s Law principle is real: your architecture will mirror your org chart whether you plan for it or not. I’ve stopped fighting that and started using it deliberately.

Before drawing any service boundary, I run three questions past the team:

Who owns this data exclusively? If two teams argue about who writes to a table, that table belongs in a shared data service — not two separate ones.
What’s the blast radius of a deploy here? Billing and user-auth deserve isolation. Static content rendering doesn’t.
What are the scale characteristics? A PDF-generation service that spikes at month-end has totally different resource needs than a real-time websocket gateway.

For most B2B platforms, I end up with roughly the same core split: an auth/identity service, a billing/subscription service, a core domain service (the thing the product actually does), a notification service, and a reporting/analytics service that reads from an event stream rather than the primary DB.

The Communication Layer

The choice between synchronous REST/gRPC and asynchronous event streaming is where most teams make their first big mistake. They go all-in on one or the other.

Synchronous calls (I prefer gRPC for internal service-to-service) are correct when the caller needs a response to proceed — like creating a checkout session. Async events are correct for everything that can tolerate eventual consistency — like sending a welcome email or updating a denormalized reporting table.

The pattern I use is a saga for multi-step business operations that cross service boundaries. When a new customer signs up, the flow goes: auth-service creates the identity → emits user.created → billing-service provisions the free tier → emits subscription.created → notification-service fires the welcome email. No service calls another service directly. Each step is independently retryable.

// Example: Publishing a domain event with metadata
interface DomainEvent<T = unknown> {
  id: string;
  type: string;
  aggregateId: string;
  payload: T;
  occurredAt: string;
  correlationId: string;
}

async function publishEvent<T>(
  topic: string,
  event: Omit<DomainEvent<T>, 'id' | 'occurredAt'>
): Promise<void> {
  const fullEvent: DomainEvent<T> = {
    ...event,
    id: crypto.randomUUID(),
    occurredAt: new Date().toISOString(),
  };
  await messageBus.publish(topic, fullEvent);
}

// Domain event interface and publisher
import java.time.Instant;
import java.util.UUID;

public record DomainEvent<T>(
    String id,
    String type,
    String aggregateId,
    T payload,
    String occurredAt,
    String correlationId
) {}

@Service
public class EventPublisher {
    private final KafkaTemplate<String, Object> kafkaTemplate;

    public <T> void publishEvent(String topic,
            String type, String aggregateId, T payload, String correlationId) {
        var event = new DomainEvent<>(
            UUID.randomUUID().toString(),
            type,
            aggregateId,
            payload,
            Instant.now().toString(),
            correlationId
        );
        kafkaTemplate.send(topic, aggregateId, event);
    }
}

# Domain event dataclass and publisher
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Generic, TypeVar

T = TypeVar("T")

@dataclass
class DomainEvent(Generic[T]):
    type: str
    aggregate_id: str
    payload: T
    correlation_id: str
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
    occurred_at: str = field(
        default_factory=lambda: datetime.now(timezone.utc).isoformat()
    )

async def publish_event(topic: str, event_data: DomainEvent) -> None:
    await message_bus.publish(topic, event_data)

// Domain event record and publisher
using MediatR;

public record DomainEvent<T>(
    string Id,
    string Type,
    string AggregateId,
    T Payload,
    string OccurredAt,
    string CorrelationId
);

public class EventPublisher
{
    private readonly IMessageBus _messageBus;

    public EventPublisher(IMessageBus messageBus) => _messageBus = messageBus;

    public async Task PublishEventAsync<T>(
        string topic, string type, string aggregateId,
        T payload, string correlationId)
    {
        var evt = new DomainEvent<T>(
            Id: Guid.NewGuid().ToString(),
            Type: type,
            AggregateId: aggregateId,
            Payload: payload,
            OccurredAt: DateTimeOffset.UtcNow.ToString("O"),
            CorrelationId: correlationId
        );
        await _messageBus.PublishAsync(topic, evt);
    }
}

Deployment and Observability

This is where the second mistake happens: teams think you need sophisticated tooling to manage microservices. You don’t. You need boring, observable, repeatable deployment. Kubernetes is the industry standard, but you can build this with Docker Swarm, ECS, or even hand-rolled deployments as long as you have:

Immutable deployments: Each artifact is built once, versioned, tested, and promoted. Never rebuilt or re-tagged.
Blue-green deploys: Keep two full environments running. Switch traffic atomically. Rollback in seconds if needed.
Health checks and graceful shutdown: Services need to signal readiness before receiving traffic and drain existing requests during shutdown.
Logging to stdout: No files. Push logs to a centralized system like DataDog or ELK. Blame the 12-factor app manifesto if anyone complains.

For B2B platforms, I’ve found a simple pattern works: run a cron job every 5 minutes that compares the desired state (defined in git) against the actual state (what’s running). If they don’t match, apply the changes. Boring, but it’s survived numerous outages and late-night deployments without incident.

Growing Beyond This

As your platform scales, you’ll face new problems: distributed tracing, rate limiting across service boundaries, circuit breakers, and compensation logic for failed sagas. These are all solvable with mature frameworks (I like NestJS for Node.js teams). But they’re problems you want to have—they mean your product is growing fast enough to hit them.

The architecture I’ve described will take you to seven figures in annual revenue without strain. Beyond that, the platform usually splits: the API layer, the core business logic, and the reporting layer each become their own “universe” that scales independently. That’s a future problem. For now, focus on organizational clarity and deployment safety.

Building Scalable B2B Platforms with Microservices

The Monolith Trap

Identifying Service Boundaries

The Communication Layer

Deployment and Observability

Growing Beyond This

Continue Reading

Testing Strategy for Production Systems

Feature Flags and Progressive Delivery in Production