All articles
Kafka Event Streaming Beginner Backend Messaging

Kafka for Beginners: What It Is, When to Use It, and Producer/Consumer Examples in Java, Python, Node.js, and Go

Palakorn Voramongkol
April 15, 2026 14 min read

“A ground-up introduction to Apache Kafka — what it actually is, the handful of features that matter, the use cases it's genuinely good at, and working producer/consumer snippets in four languages so you can run your first topic in under ten minutes.”

If you’ve heard “Kafka” thrown around in architecture discussions and walked away with the impression that it’s some kind of message queue, a log, a database, and a firehose all at once — you weren’t wrong. Kafka is genuinely all of those things, and that’s why the explanations get confusing. This post ignores the buzzwords and starts from a simple question: what problem does Kafka actually solve, and how do you use it?

We’ll end with working producer-and-consumer code in four languages, so you can actually try it — not just read about it.

TL;DR

  • Kafka is a distributed append-only log that many producers write to and many consumers read from, independently.
  • It is not a queue in the RabbitMQ sense — consumers don’t “pop” messages; they read from an offset they control.
  • Core primitives you actually need to know: topic, partition, offset, consumer group.
  • Great fit for: event-driven architectures, analytics pipelines, audit logs, decoupling microservices, replayable processing.
  • Not a fit for: RPC, tiny low-volume apps, anything where you need a single-machine in-memory queue.
  • The quickest way to learn it is to run it locally and write a producer + consumer — we’ll do both below.

What Kafka Actually Is

Strip away the marketing and Kafka is three ideas:

  1. An append-only log. Producers append records to the end of a log. The log is immutable — nothing overwrites or deletes records mid-stream. Records are kept for a configured retention window (hours, days, weeks, or forever).
  2. Partitioned, so it scales. Each topic (a named log) is split into partitions that can live on different brokers (servers). You add throughput by adding partitions and brokers.
  3. Consumer-driven reads. Consumers track their own position (an offset) in the log. They can replay from an old offset, jump forward, or read in parallel as a group — the broker doesn’t decide for them.

That’s the whole trick. Everything else — exactly-once semantics, Kafka Streams, Schema Registry, Connect — is built on top of those three ideas.

The Five Concepts You Actually Need

You can get productive with Kafka knowing only five things.

1. Topic

A named stream. Think of it like a channel or a table. You publish messages to a topic; you consume messages from a topic. Topics are created ahead of time and configured with a replication factor and partition count.

2. Partition

A topic is split into one or more partitions. Each partition is an ordered, immutable sequence of records. Order is guaranteed within a partition, not across partitions.

The rule of thumb:

  • 1 partition → all messages are globally ordered, but throughput is capped at one consumer.
  • N partitions → N consumers can read in parallel; order is only preserved per-partition.

Pick your partition count based on throughput and parallelism, not on how many topics you have.

3. Offset

Every record gets a monotonically-increasing ID within its partition — its offset. A consumer’s position is just an offset per partition. Commit the offset, and a restart resumes where you left off. Reset the offset, and you replay history.

4. Consumer Group

Multiple consumers with the same group.id cooperate: Kafka assigns each partition to exactly one consumer in the group. Add consumers, you scale read throughput. Add more consumers than partitions, the extras sit idle.

This is the single most misunderstood Kafka concept. Consumer groups are the unit of parallelism; partitions are the unit of distribution.

5. Broker

A Kafka server. A cluster is 3+ brokers. Topics are replicated across brokers for durability. Producers and consumers don’t care which broker has which partition — they negotiate with the cluster.

Features Worth Knowing (But Not Reading About First)

These are features you’ll reach for once the basics click — don’t worry about them on day one.

  • Replication. Each partition has a leader replica and one or more followers. If the leader dies, a follower takes over.
  • Retention. Time-based (e.g., 7 days) or size-based (e.g., 10 GB per partition). Also compacted topics, which keep only the latest value per key.
  • Acks. Producer config: acks=0 (fire and forget), acks=1 (leader ack), acks=all (all in-sync replicas ack). Start with acks=all in production.
  • Keys. Records can have a key. Kafka hashes the key to pick a partition — so all records with the same key land in the same partition and stay in order.
  • Schema Registry. Stores Avro/Protobuf/JSON schemas so producers and consumers can evolve independently without breaking each other.
  • Kafka Connect. Pre-built connectors to pipe data in and out of databases, S3, Elasticsearch, etc., without writing code.
  • Kafka Streams / ksqlDB. In-process stream processing libraries — think “SQL over a live topic” or a lightweight Flink.

Skip all of these on your first Kafka project. Topic + producer + consumer gets you surprisingly far.

When to Use Kafka — and When Not To

Good fits

  • Event-driven microservices. One service emits OrderPlaced; several services react independently. No point-to-point HTTP spaghetti.
  • Analytics / CDC pipelines. Stream database changes into a warehouse; feed dashboards; train models on live data.
  • Audit logs. Append-only by design; retention is configurable; replay is built in.
  • Decoupling producers from consumers. Producers don’t need to know who the consumers are, and consumers can be added or removed freely.
  • Backpressure tolerance. The log absorbs bursts; consumers process at their own pace.

Bad fits

  • RPC. You want a reply? Use gRPC or HTTP. Kafka is one-way.
  • Sub-millisecond latency. Kafka is fast, but the p99 is milliseconds, not microseconds. For ultra-low-latency, use a dedicated in-memory broker.
  • Small-volume apps. A cluster and its tooling is operational overhead. If you have one service emitting a few events per second, a lightweight queue (Redis, RabbitMQ, or even a DB table) is often enough.
  • Priority queues or complex routing. Kafka is topics and partitions, not exchanges and headers. If you need RabbitMQ’s routing semantics, use RabbitMQ.

A Mental Model: Kafka as a Newspaper

A useful analogy: Kafka is a newspaper.

  • The topic is the paper (e.g., “Finance Daily”).
  • The partition is a section (Business, Markets, Tech).
  • The offset is the page number.
  • The producer is a journalist submitting articles.
  • The consumer is a reader who keeps a bookmark on the page they’re currently reading.
  • The consumer group is a reading club — they divide the sections among themselves so no one re-reads what someone else already read.
  • Retention is how long the archive keeps back issues.

Most questions about Kafka dissolve if you hold this picture in your head.

flowchart LR
  P1[Producer A] --> T
  P2[Producer B] --> T
  subgraph T[Topic: orders]
    direction TB
    P0[Partition 0<br/>offsets 0,1,2,...]
    P1p[Partition 1<br/>offsets 0,1,2,...]
    P2p[Partition 2<br/>offsets 0,1,2,...]
  end
  T --> G1[Consumer group: billing]
  T --> G2[Consumer group: analytics]
  G1 --> C1[Consumer 1]
  G1 --> C2[Consumer 2]
  G2 --> C3[Consumer 3]

Two consumer groups reading the same topic independently — billing processes orders for invoicing; analytics streams the same events into a warehouse. Neither group blocks the other.

Getting a Broker Running in 60 Seconds

You don’t need a production cluster to learn Kafka. Docker Compose and one file is enough.

# docker-compose.yml
services:
  kafka:
    image: apache/kafka:3.8.0
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
docker compose up -d
docker exec kafka kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --create --topic orders --partitions 3 --replication-factor 1

You now have a single-broker Kafka on localhost:9092 with a topic called orders. That’s the full setup for everything below.

Producer and Consumer in Four Languages

Each snippet does the same thing: produces three messages to the orders topic (keyed by customer ID), then consumes them from a consumer group called example-consumer. Start the consumer first, then run the producer — you should see the three messages print out in order per key.

Java (Official Kafka Client)

<!-- pom.xml -->
<dependency>
  <groupId>org.apache.kafka</groupId>
  <artifactId>kafka-clients</artifactId>
  <version>3.8.0</version>
</dependency>

Producer:

import java.util.Properties;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

public class OrderProducer {
  public static void main(String[] args) throws Exception {
    var props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    props.put(ProducerConfig.ACKS_CONFIG, "all");

    try (var producer = new KafkaProducer<String, String>(props)) {
      for (int i = 1; i <= 3; i++) {
        var record = new ProducerRecord<>(
          "orders",
          "customer-" + i,
          "{\"orderId\":" + i + ",\"amount\":" + (i * 10) + "}"
        );
        var md = producer.send(record).get();
        System.out.printf("sent to %s-%d@%d%n", md.topic(), md.partition(), md.offset());
      }
    }
  }
}

Consumer:

import java.time.Duration;
import java.util.*;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

public class OrderConsumer {
  public static void main(String[] args) {
    var props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.GROUP_ID_CONFIG, "example-consumer");
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

    try (var consumer = new KafkaConsumer<String, String>(props)) {
      consumer.subscribe(List.of("orders"));
      while (true) {
        var records = consumer.poll(Duration.ofMillis(500));
        for (var rec : records) {
          System.out.printf("received key=%s value=%s partition=%d offset=%d%n",
            rec.key(), rec.value(), rec.partition(), rec.offset());
        }
      }
    }
  }
}

Python (confluent-kafka)

pip install confluent-kafka

Producer:

from confluent_kafka import Producer
import json

p = Producer({"bootstrap.servers": "localhost:9092", "acks": "all"})

def delivered(err, msg):
    if err:
        print(f"delivery failed: {err}")
    else:
        print(f"sent to {msg.topic()}-{msg.partition()}@{msg.offset()}")

for i in range(1, 4):
    p.produce(
        "orders",
        key=f"customer-{i}",
        value=json.dumps({"orderId": i, "amount": i * 10}),
        callback=delivered,
    )
p.flush()

Consumer:

from confluent_kafka import Consumer

c = Consumer({
    "bootstrap.servers": "localhost:9092",
    "group.id": "example-consumer",
    "auto.offset.reset": "earliest",
})
c.subscribe(["orders"])

try:
    while True:
        msg = c.poll(0.5)
        if msg is None:
            continue
        if msg.error():
            print(f"error: {msg.error()}")
            continue
        print(
            f"received key={msg.key().decode()} "
            f"value={msg.value().decode()} "
            f"partition={msg.partition()} offset={msg.offset()}"
        )
finally:
    c.close()

Node.js (kafkajs)

npm install kafkajs

Producer (producer.mjs):

import { Kafka } from "kafkajs";

const kafka = new Kafka({ clientId: "example-app", brokers: ["localhost:9092"] });
const producer = kafka.producer();

await producer.connect();
const results = await producer.send({
  topic: "orders",
  acks: -1, // all
  messages: Array.from({ length: 3 }, (_, i) => ({
    key: `customer-${i + 1}`,
    value: JSON.stringify({ orderId: i + 1, amount: (i + 1) * 10 }),
  })),
});
console.log("sent:", results);
await producer.disconnect();

Consumer (consumer.mjs):

import { Kafka } from "kafkajs";

const kafka = new Kafka({ clientId: "example-app", brokers: ["localhost:9092"] });
const consumer = kafka.consumer({ groupId: "example-consumer" });

await consumer.connect();
await consumer.subscribe({ topic: "orders", fromBeginning: true });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    console.log(
      `received key=${message.key?.toString()} ` +
      `value=${message.value?.toString()} ` +
      `partition=${partition} offset=${message.offset}`
    );
  },
});

Go (segmentio/kafka-go)

go get github.com/segmentio/kafka-go

Producer (producer.go):

package main

import (
  "context"
  "fmt"
  "github.com/segmentio/kafka-go"
)

func main() {
  w := &kafka.Writer{
    Addr:         kafka.TCP("localhost:9092"),
    Topic:        "orders",
    Balancer:     &kafka.Hash{},
    RequiredAcks: kafka.RequireAll,
  }
  defer w.Close()

  var msgs []kafka.Message
  for i := 1; i <= 3; i++ {
    msgs = append(msgs, kafka.Message{
      Key:   []byte(fmt.Sprintf("customer-%d", i)),
      Value: []byte(fmt.Sprintf(`{"orderId":%d,"amount":%d}`, i, i*10)),
    })
  }
  if err := w.WriteMessages(context.Background(), msgs...); err != nil {
    panic(err)
  }
  fmt.Println("sent 3 messages")
}

Consumer (consumer.go):

package main

import (
  "context"
  "fmt"
  "github.com/segmentio/kafka-go"
)

func main() {
  r := kafka.NewReader(kafka.ReaderConfig{
    Brokers: []string{"localhost:9092"},
    GroupID: "example-consumer",
    Topic:   "orders",
  })
  defer r.Close()

  for {
    m, err := r.ReadMessage(context.Background())
    if err != nil {
      panic(err)
    }
    fmt.Printf("received key=%s value=%s partition=%d offset=%d\n",
      string(m.Key), string(m.Value), m.Partition, m.Offset)
  }
}

What to Learn Next

You now know enough to be dangerous. In rough order of “earns the most understanding per hour invested”:

  1. Keys and partitioning. Try producing with and without a key. Watch how the partition assignment changes and why it matters for ordering.
  2. Consumer groups. Start two consumers in the same group and watch them split the partitions. Kill one. Watch rebalancing happen.
  3. Offsets. Try auto.offset.reset=earliest vs latest. Commit offsets manually instead of automatically. Replay from the start.
  4. Delivery guarantees. Understand the difference between at-most-once, at-least-once, and exactly-once — and why “exactly-once” needs transactions.
  5. Schema Registry. As soon as two teams share a topic, schemas become a people problem. Avro or Protobuf plus a registry is how it’s solved in practice.
  6. Production concerns. Replication factor, min in-sync replicas, monitoring lag, retention, rack awareness. Plenty to read — but only after you’ve written code against a real topic.

Further Reading

The thing most beginner tutorials skip: you will understand Kafka ten times faster by running a single broker on your laptop and writing a dumb producer/consumer than by reading another overview of what it is. Do that, then come back.

Comments powered by Giscus are not yet configured. Set PUBLIC_GISCUS_REPO_ID and PUBLIC_GISCUS_CATEGORY_ID in apps/web/.env to enable.

PV

Written by Palakorn Voramongkol

Software Engineer Specialist with 20+ years of experience. Writing about architecture, performance, and building production systems.

More about me

Continue Reading