Lab 11: Kafka & Change Data Capture

Time: 50 minutes | Level: Architect | DB: Apache Kafka (Confluent)


🎯 Objective

Master Kafka architecture: brokers, topics, partitions, consumer groups, offsets. Run Kafka with Docker, create topics, produce/consume messages, measure consumer lag, and implement CDC pattern.


📚 Background

Kafka Architecture

Producers → [Topic: orders]  → Consumers
              ├── Partition 0: [msg1][msg2][msg5]...  ← Consumer Group A
              ├── Partition 1: [msg3][msg6]...         ← Consumer Group A
              └── Partition 2: [msg4][msg7]...         ← Consumer Group A
                                                        Consumer Group B reads same topic

Key Concepts:

Concept
Description

Broker

Kafka server; stores and serves partitions

Topic

Named stream of records (like a database table)

Partition

Ordered, immutable log; enables parallelism

Offset

Sequential ID for each message in partition

Consumer Group

Logical subscriber; each partition → one consumer

Retention

Messages kept N days/bytes; not deleted on consume

Kafka as Database Changelog (CDC)


Step 1: Start Kafka with Docker

📸 Verified Output:


Step 2: Create Topics

📸 Verified Output:


Step 3: Produce Messages

📸 Verified Output:


Step 4: Consume Messages

📸 Verified Output:


Step 5: Consumer Groups & Lag

📸 Verified Output:

💡 Consumer lag is the most important Kafka operational metric. Lag = LOG-END-OFFSET - CURRENT-OFFSET. Increasing lag = consumer can't keep up with producers.


Step 6: Kafka as Database Changelog (CDC)

📸 Verified Output:


Step 7: Kafka Configuration Best Practices

📸 Verified Output:


Step 8: Capstone — Kafka Architecture Review

📸 Verified Output:


Summary

Concept
Key Takeaway

Topic

Named append-only log; messages retained N days

Partition

Ordered unit of parallelism; key → partition (consistent)

Offset

Sequential message ID within partition; consumers track position

Consumer Group

Group of consumers sharing a topic; 1 partition → 1 consumer

Consumer Lag

LOG-END-OFFSET - CURRENT-OFFSET; key health metric

kafka-topics.sh

Create, list, describe, alter topics

kafka-consumer-groups.sh

Inspect and reset consumer group offsets

CDC with Debezium

MySQL binlog → Kafka topic; every DB change = event

acks=all

Wait for all ISR replicas; required for durability

idempotent producer

Exactly-once at producer level; prevents duplicates

💡 Architect's insight: Kafka is not just a message queue — it's a durable, replayable event log. The ability to replay from offset 0 makes it the backbone of event sourcing, CDC pipelines, and CQRS architectures.

Last updated