If you’ve heard “Kafka” thrown around in architecture discussions and walked away with the impression that it’s some kind of message queue, a log, a database, and a firehose all at once — you weren’t wrong. Kafka is genuinely all of those things, and that’s why the explanations get confusing. This post ignores the buzzwords and starts from a simple question: what problem does Kafka actually solve, and how do you use it?
We’ll end with working producer-and-consumer code in four languages, so you can actually try it — not just read about it.
TL;DR
- Kafka is a distributed append-only log that many producers write to and many consumers read from, independently.
- It is not a queue in the RabbitMQ sense — consumers don’t “pop” messages; they read from an offset they control.
- Core primitives you actually need to know: topic, partition, offset, consumer group.
- Great fit for: event-driven architectures, analytics pipelines, audit logs, decoupling microservices, replayable processing.
- Not a fit for: RPC, tiny low-volume apps, anything where you need a single-machine in-memory queue.
- The quickest way to learn it is to run it locally and write a producer + consumer — we’ll do both below.
What Kafka Actually Is
Strip away the marketing and Kafka is three ideas:
- An append-only log. Producers append records to the end of a log. The log is immutable — nothing overwrites or deletes records mid-stream. Records are kept for a configured retention window (hours, days, weeks, or forever).
- Partitioned, so it scales. Each topic (a named log) is split into partitions that can live on different brokers (servers). You add throughput by adding partitions and brokers.
- Consumer-driven reads. Consumers track their own position (an offset) in the log. They can replay from an old offset, jump forward, or read in parallel as a group — the broker doesn’t decide for them.
That’s the whole trick. Everything else — exactly-once semantics, Kafka Streams, Schema Registry, Connect — is built on top of those three ideas.
The Five Concepts You Actually Need
You can get productive with Kafka knowing only five things.
1. Topic
A named stream. Think of it like a channel or a table. You publish messages to a topic; you consume messages from a topic. Topics are created ahead of time and configured with a replication factor and partition count.
2. Partition
A topic is split into one or more partitions. Each partition is an ordered, immutable sequence of records. Order is guaranteed within a partition, not across partitions.
The rule of thumb:
- 1 partition → all messages are globally ordered, but throughput is capped at one consumer.
- N partitions → N consumers can read in parallel; order is only preserved per-partition.
Pick your partition count based on throughput and parallelism, not on how many topics you have.
3. Offset
Every record gets a monotonically-increasing ID within its partition — its offset. A consumer’s position is just an offset per partition. Commit the offset, and a restart resumes where you left off. Reset the offset, and you replay history.
4. Consumer Group
Multiple consumers with the same group.id cooperate: Kafka assigns each partition to exactly one consumer in the group. Add consumers, you scale read throughput. Add more consumers than partitions, the extras sit idle.
This is the single most misunderstood Kafka concept. Consumer groups are the unit of parallelism; partitions are the unit of distribution.
5. Broker
A Kafka server. A cluster is 3+ brokers. Topics are replicated across brokers for durability. Producers and consumers don’t care which broker has which partition — they negotiate with the cluster.
Features Worth Knowing (But Not Reading About First)
These are features you’ll reach for once the basics click — don’t worry about them on day one.
- Replication. Each partition has a leader replica and one or more followers. If the leader dies, a follower takes over.
- Retention. Time-based (e.g., 7 days) or size-based (e.g., 10 GB per partition). Also compacted topics, which keep only the latest value per key.
- Acks. Producer config:
acks=0(fire and forget),acks=1(leader ack),acks=all(all in-sync replicas ack). Start withacks=allin production. - Keys. Records can have a key. Kafka hashes the key to pick a partition — so all records with the same key land in the same partition and stay in order.
- Schema Registry. Stores Avro/Protobuf/JSON schemas so producers and consumers can evolve independently without breaking each other.
- Kafka Connect. Pre-built connectors to pipe data in and out of databases, S3, Elasticsearch, etc., without writing code.
- Kafka Streams / ksqlDB. In-process stream processing libraries — think “SQL over a live topic” or a lightweight Flink.
Skip all of these on your first Kafka project. Topic + producer + consumer gets you surprisingly far.
When to Use Kafka — and When Not To
Good fits
- Event-driven microservices. One service emits
OrderPlaced; several services react independently. No point-to-point HTTP spaghetti. - Analytics / CDC pipelines. Stream database changes into a warehouse; feed dashboards; train models on live data.
- Audit logs. Append-only by design; retention is configurable; replay is built in.
- Decoupling producers from consumers. Producers don’t need to know who the consumers are, and consumers can be added or removed freely.
- Backpressure tolerance. The log absorbs bursts; consumers process at their own pace.
Bad fits
- RPC. You want a reply? Use gRPC or HTTP. Kafka is one-way.
- Sub-millisecond latency. Kafka is fast, but the p99 is milliseconds, not microseconds. For ultra-low-latency, use a dedicated in-memory broker.
- Small-volume apps. A cluster and its tooling is operational overhead. If you have one service emitting a few events per second, a lightweight queue (Redis, RabbitMQ, or even a DB table) is often enough.
- Priority queues or complex routing. Kafka is topics and partitions, not exchanges and headers. If you need RabbitMQ’s routing semantics, use RabbitMQ.
A Mental Model: Kafka as a Newspaper
A useful analogy: Kafka is a newspaper.
- The topic is the paper (e.g., “Finance Daily”).
- The partition is a section (Business, Markets, Tech).
- The offset is the page number.
- The producer is a journalist submitting articles.
- The consumer is a reader who keeps a bookmark on the page they’re currently reading.
- The consumer group is a reading club — they divide the sections among themselves so no one re-reads what someone else already read.
- Retention is how long the archive keeps back issues.
Most questions about Kafka dissolve if you hold this picture in your head.
flowchart LR
P1[Producer A] --> T
P2[Producer B] --> T
subgraph T[Topic: orders]
direction TB
P0[Partition 0<br/>offsets 0,1,2,...]
P1p[Partition 1<br/>offsets 0,1,2,...]
P2p[Partition 2<br/>offsets 0,1,2,...]
end
T --> G1[Consumer group: billing]
T --> G2[Consumer group: analytics]
G1 --> C1[Consumer 1]
G1 --> C2[Consumer 2]
G2 --> C3[Consumer 3]
Two consumer groups reading the same topic independently — billing processes orders for invoicing; analytics streams the same events into a warehouse. Neither group blocks the other.
Getting a Broker Running in 60 Seconds
You don’t need a production cluster to learn Kafka. Docker Compose and one file is enough.
# docker-compose.yml
services:
kafka:
image: apache/kafka:3.8.0
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
docker compose up -d
docker exec kafka kafka-topics.sh \
--bootstrap-server localhost:9092 \
--create --topic orders --partitions 3 --replication-factor 1
You now have a single-broker Kafka on localhost:9092 with a topic called orders. That’s the full setup for everything below.
Producer and Consumer in Four Languages
Each snippet does the same thing: produces three messages to the orders topic (keyed by customer ID), then consumes them from a consumer group called example-consumer. Start the consumer first, then run the producer — you should see the three messages print out in order per key.
Java (Official Kafka Client)
<!-- pom.xml -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.8.0</version>
</dependency>
Producer:
import java.util.Properties;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;
public class OrderProducer {
public static void main(String[] args) throws Exception {
var props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.ACKS_CONFIG, "all");
try (var producer = new KafkaProducer<String, String>(props)) {
for (int i = 1; i <= 3; i++) {
var record = new ProducerRecord<>(
"orders",
"customer-" + i,
"{\"orderId\":" + i + ",\"amount\":" + (i * 10) + "}"
);
var md = producer.send(record).get();
System.out.printf("sent to %s-%d@%d%n", md.topic(), md.partition(), md.offset());
}
}
}
}
Consumer:
import java.time.Duration;
import java.util.*;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
public class OrderConsumer {
public static void main(String[] args) {
var props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "example-consumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
try (var consumer = new KafkaConsumer<String, String>(props)) {
consumer.subscribe(List.of("orders"));
while (true) {
var records = consumer.poll(Duration.ofMillis(500));
for (var rec : records) {
System.out.printf("received key=%s value=%s partition=%d offset=%d%n",
rec.key(), rec.value(), rec.partition(), rec.offset());
}
}
}
}
}
Python (confluent-kafka)
pip install confluent-kafka
Producer:
from confluent_kafka import Producer
import json
p = Producer({"bootstrap.servers": "localhost:9092", "acks": "all"})
def delivered(err, msg):
if err:
print(f"delivery failed: {err}")
else:
print(f"sent to {msg.topic()}-{msg.partition()}@{msg.offset()}")
for i in range(1, 4):
p.produce(
"orders",
key=f"customer-{i}",
value=json.dumps({"orderId": i, "amount": i * 10}),
callback=delivered,
)
p.flush()
Consumer:
from confluent_kafka import Consumer
c = Consumer({
"bootstrap.servers": "localhost:9092",
"group.id": "example-consumer",
"auto.offset.reset": "earliest",
})
c.subscribe(["orders"])
try:
while True:
msg = c.poll(0.5)
if msg is None:
continue
if msg.error():
print(f"error: {msg.error()}")
continue
print(
f"received key={msg.key().decode()} "
f"value={msg.value().decode()} "
f"partition={msg.partition()} offset={msg.offset()}"
)
finally:
c.close()
Node.js (kafkajs)
npm install kafkajs
Producer (producer.mjs):
import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "example-app", brokers: ["localhost:9092"] });
const producer = kafka.producer();
await producer.connect();
const results = await producer.send({
topic: "orders",
acks: -1, // all
messages: Array.from({ length: 3 }, (_, i) => ({
key: `customer-${i + 1}`,
value: JSON.stringify({ orderId: i + 1, amount: (i + 1) * 10 }),
})),
});
console.log("sent:", results);
await producer.disconnect();
Consumer (consumer.mjs):
import { Kafka } from "kafkajs";
const kafka = new Kafka({ clientId: "example-app", brokers: ["localhost:9092"] });
const consumer = kafka.consumer({ groupId: "example-consumer" });
await consumer.connect();
await consumer.subscribe({ topic: "orders", fromBeginning: true });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
console.log(
`received key=${message.key?.toString()} ` +
`value=${message.value?.toString()} ` +
`partition=${partition} offset=${message.offset}`
);
},
});
Go (segmentio/kafka-go)
go get github.com/segmentio/kafka-go
Producer (producer.go):
package main
import (
"context"
"fmt"
"github.com/segmentio/kafka-go"
)
func main() {
w := &kafka.Writer{
Addr: kafka.TCP("localhost:9092"),
Topic: "orders",
Balancer: &kafka.Hash{},
RequiredAcks: kafka.RequireAll,
}
defer w.Close()
var msgs []kafka.Message
for i := 1; i <= 3; i++ {
msgs = append(msgs, kafka.Message{
Key: []byte(fmt.Sprintf("customer-%d", i)),
Value: []byte(fmt.Sprintf(`{"orderId":%d,"amount":%d}`, i, i*10)),
})
}
if err := w.WriteMessages(context.Background(), msgs...); err != nil {
panic(err)
}
fmt.Println("sent 3 messages")
}
Consumer (consumer.go):
package main
import (
"context"
"fmt"
"github.com/segmentio/kafka-go"
)
func main() {
r := kafka.NewReader(kafka.ReaderConfig{
Brokers: []string{"localhost:9092"},
GroupID: "example-consumer",
Topic: "orders",
})
defer r.Close()
for {
m, err := r.ReadMessage(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("received key=%s value=%s partition=%d offset=%d\n",
string(m.Key), string(m.Value), m.Partition, m.Offset)
}
}
What to Learn Next
You now know enough to be dangerous. In rough order of “earns the most understanding per hour invested”:
- Keys and partitioning. Try producing with and without a key. Watch how the partition assignment changes and why it matters for ordering.
- Consumer groups. Start two consumers in the same group and watch them split the partitions. Kill one. Watch rebalancing happen.
- Offsets. Try
auto.offset.reset=earliestvslatest. Commit offsets manually instead of automatically. Replay from the start. - Delivery guarantees. Understand the difference between at-most-once, at-least-once, and exactly-once — and why “exactly-once” needs transactions.
- Schema Registry. As soon as two teams share a topic, schemas become a people problem. Avro or Protobuf plus a registry is how it’s solved in practice.
- Production concerns. Replication factor, min in-sync replicas, monitoring lag, retention, rack awareness. Plenty to read — but only after you’ve written code against a real topic.
Further Reading
- Apache Kafka official documentation — the canonical reference. Dense but authoritative.
- Kafka: The Definitive Guide, 2nd edition (Narkhede, Shapira, Palino) — the book. Worth owning.
- The Confluent developer courses — free, hands-on, well-paced.
- KRaft mode intro — Kafka’s current consensus layer (no ZooKeeper).
- And the earlier deep-dive on Kafka on this site if you’re ready to go beyond the basics.
The thing most beginner tutorials skip: you will understand Kafka ten times faster by running a single broker on your laptop and writing a dumb producer/consumer than by reading another overview of what it is. Do that, then come back.