A Simple Apache Kafka Cluster With Docker, Kafdrop, and Python

This guide will demonstrate how to deploy a minimal Apache Kafka cluster on Docker and set up producers and consumers using Python. We’ll also deploy an instance of Kafdrop for easy cluster monitoring.

“Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.” — Apache Kafka

Prerequisites

This tutorial expects you to have a Unix system (Mac or Linux) with Docker Compose installed. You can find instructions to install Docker and Docker Compose by following the official Docker documentation.

The full code for this tutorial can be found on GitHub.

Apache Kafka Architecture

Apache Kafka’s architecture is comparatively straightforward compared to other message brokers, such as RabbitMQ or ActiveMQ.

Kafka is essentially a commit log with a very simplistic data structure. It just so happens to be exceptionally fault-tolerant, horizontally scalable, and capable of handling huge throughput. This has made Kafka extremely popular for many large enterprise organisations, where applications range from pub-sub messaging to log aggregation.

We’ll be deploying a simple Kafka setup, consisting of the following components:

  • Kafka cluster: A distributed system of Kafka brokers
  • Kafka broker: The message broker responsible for mediating the data between the producers and the consumers. They’re responsible for the bulk of I/O operations and durable persistence within the cluster.
  • ZooKeeper: Manages the overall controller status in the cluster. It acts as a configuration repository, maintaining cluster metadata and also implementing the actual mechanics of the cluster.
  • Kafka producer: Client applications responsible for appending records to Kafka topics
  • Kafka consumer: Client applications that read from topics

The below diagram depicts the architecture of the minimal Apache Kafka cluster we’ll be deploying. The most basic setup consists of just one broker and one ZooKeeper node (blue); however, to add resilience, we’ll deploy two additional brokers into the cluster (green).

Running ZooKeeper in Docker

Verify you have Docker and Docker Compose successfully installed:

$ docker -v                                                                                                                   
> Docker version 19.03.12, build 48a66213fe
$ docker-compose -v                                                                                                           
> docker-compose version 1.27.2, build 18f557f9

Great, we’re ready to get started! Let’s start by deploying ZooKeeper. First, create a new working directory to store the files and data we’ll be using in the tutorial:

mkdir kafka
cd kafka

Next, create a new file called docker-compose.yml. This contains the configuration for deploying with Docker Compose.

touch docker-compose.yml

Now, open this file in your favourite text editor. We’re ready to add configuration to deploy ZooKeeper.

Going through the configuration line by line:

version: '3'
services:
  zookeeper:
    image: zookeeper:3.4.9
    hostname: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1
      ZOO_PORT: 2181
      ZOO_SERVERS: server.1=zookeeper:2888:3888
    volumes:
      - ./data/zookeeper/data:/data
      - ./data/zookeeper/datalog:/datalog
  • Line 1: The Docker Compose file version number
  • Line 2: Start services definition
  • Line 3: Start ZooKeeper configuration
  • Line 4: The Docker image to use for ZooKeeper, downloaded from Docker Hub
  • Line 5: The hostname the Docker container will use
  • Line 6-7: Expose port 2181 to the host. This is the default ZooKeeper port.
  • Line 9: The unique ID for this ZooKeeper instance
  • Line 10: The port ZooKeeper will run on
  • Line 11: The list of ZooKeeper servers. We’re only deploying one.
  • Line 12–14: Mapping directories on the host to directories in the container in order to persist data

Now we can start ZooKeeper by running Docker Compose:

docker-compose up

This will download the ZooKeeper Docker container and start a ZooKeeper instance. You’ll see log messages ending with binding to port 0.0.0.0/0.0.0.0:2181. This means ZooKeeper is running and bound to port 2181.

Running Kafka in Docker

Next, let’s add Apache Kafka into the mix. We’ll add our first kafka service to the configuration file:

Again, going through the configuration line by line:

services:
...
  kafka1:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka1
    ports:
      - "9091:9091"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka1/data:/var/lib/kafka/data
    depends_on:
      - zookeeper
  • Line 5: The image to use for Kafka from Docker Hub
  • Line 6: The hostname of the container
  • Line 7-8: The port to expose, set to 9091
  • Line 10: Kafka’s advertised listeners. Robin Moffatt has a great blog post about this.
  • Line 11: Security protocols to use for each listener
  • Line 12: The interbroker listener name (used for internal communication)
  • Line 13: The list of ZooKeeper nodes Kafka should use
  • Line 14: The broker ID of this Kafka broker
  • Line 15: The replication factor of the consumer offset topic
  • Line 16-17: Mapping volumes on the host to store Kafka data
  • Line 19-20: Ensure ZooKeeper is started before Kafka

To start the Kafka broker, you can start a new terminal window in your working directory and run docker-compose up.

If ZooKeeper is still running from the previous step, you can usectrl + c /cmd + c to stop it. Docker compose will start both ZooKeeper and Kafka together if necessary.

Tip: Use docker-compose up -d to start the containers in the background of your terminal window

After starting up the containers, you should see Kafka and ZooKeeper running. Let’s verify everything has started successfully by creating a new topic:

$ docker exec -it kafka_kafka1_1 kafka-topics --zookeeper zookeeper:2181 --create --topic my-topic --partitions 1 --replication-factor 1
> Created topic "my-topic"

Nice! We now have a minimal Kafka cluster deployed and running on our local machine in Docker.

Deploying Multiple Kafka Brokers

Next, we’re going to add two more Kafka brokers into our cluster. Having multiple brokers allows us to build out a more resilient cluster, as we can benefit from replication, fault tolerance, and extra resources.

To do this, we need to add more kafka services to our docker-compose.yml. Let’s add two more brokers:

version: "3"
services:
...
  kafka2:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka2
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_BROKER_ID: 2
    volumes:
      - ./data/kafka2/data:/var/lib/kafka/data
    depends_on:
      - zookeeper 
  kafka3:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka3
    ports:
      - "9093:9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 3
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka3/data:/var/lib/kafka/data
    depends_on:
      - zookeeper

To add additional brokers to the cluster, we just need to update the broker ID, hostname, and data volume. Here we’re adding broker 2 with ID 2 and port 9092 and broker 3 with ID 3 and port 9093.

We can verify that the three brokers are running successfully by creating another topic, this time with a replication factor of 3.

$ docker exec -it kafka_kafka1_1 kafka-topics --zookeeper zookeeper:2181 --create --topic my-topic-three --partitions 1 --replication-factor 3
> Created topic "my-topic-three".

Success! We now have a Kafka cluster running with three brokers!

Deploying Kafdrop

It’s always nice to be able to visualise key metrics for your deployments; however, Kafka doesn’t provide its own monitoring interface out of the box. Fortunately, there’s a free open-source offering called Kafdrop we can use.

“Kafdrop is a web UI for viewing Kafka topics and browsing consumer groups. The tool displays information such as brokers, topics, partitions, consumers, and lets you view messages.” — Kafdrop on GitHub

Let’s add another container to our docker-compose.yml for our Kafdrop instance:

version: "3"
services:
...
  kafdrop:
    image: obsidiandynamics/kafdrop
    restart: "no"
    ports:
      - "9000:9000"
    environment:
      KAFKA_BROKERCONNECT: "kafka1:19091"
    depends_on:
      - kafka1
      - kafka2
      - kafka3

Then. redeploy using docker-compose down anddocker-compose up -v. Once it finishes warming up, navigate to localhost:9000 in your browser. You should see the Kafdrop landing screen.

Notice we can now see some information about our Kafka cluster, including the two topics we’ve already created as part of this tutorial.

Adding a Python Producer

Time to publish some messages to Kafka. We’re going to build a simple producer using Python. First install the Python’s Kafka library:

pip install kafka

Next, create a new Python file in your working directory called producer.py. Here, we need to define the list of our Kafka servers and a topic name to publish messages to. We’ve already created my-topic-three, so let’s use that.

from kafka import KafkaProducer

bootstrap_servers = ['localhost:9091', 'localhost:9092', 'localhost:9093']
topicName = 'my-topic-three'

producer = KafkaProducer(bootstrap_servers = bootstrap_servers)
producer = KafkaProducer()

producer.send(topicName, b'Hello World!')
producer.flush()

Finally, to send a message to the topic, we call producer.send. Try running this Python code in a new terminal window:

python3 producer.py

If all goes well, you should be able to see the message show up in the Kafdrop UI.

Adding a Python Consumer

The last piece of the puzzle is adding a Python consumer to receive the messages from Kafka. Create a file called consumer.py, and add the following code:

from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic-three', bootstrap_servers=['localhost:9091','localhost:9092','localhost:9093'])
for message in consumer:
    print (message)

After importing KafkaConsumer, we need to provide the bootstrap server ID and topic name to establish a connection with Kafka server.

In a new terminal window, run python consumer.py, and then trigger your producer script again. The consumer will read messages from the topic and print the data to the console. You should see a message like this:

ConsumerRecord(topic='my-topic-three', partition=0, offset=0, timestamp=1602500127577, timestamp_type=0, key=None, value=b'Hello World!', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=12, serialized_header_size=-1)

Conclusion

Nice! You’ve successfully started a local Kafka cluster using Docker and Docker Compose. Data is persisted outside of the container on the local machine, which means you can delete containers and restart them without losing data.

You’ve also added some topics and set up basic producers and consumers for these using the Python Kafka library. Plus, you’ve deployed an instance of Kafdrop to provide an interface for monitoring your cluster.

We’ve just scratched the surface of what’s possible with Apache Kafka. If you’re just starting out, there’s still plenty to learn. You’ve got the foundations and the tools necessary, and there’s no shortage of info on the web. Without a doubt, the best way to learn is to roll up your sleeves and build something.

Thanks for reading, and happy coding!

Credits : https://betterprogramming.pub/a-simple-apache-kafka-cluster-with-docker-kafdrop-and-python-cf45ab99e2b9

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments