In this lesson we will:
- Download Open Source Apache Kafka;
- Setup our Kafka broker, initially in a standalone non-clustered mode;
- Learn how to start and stop Zookeeper and Kafka;
- Locate the Kafka state files for administration purposes.
As Kafka is free and open source, we can download it freely from the web at kafka.apache.org. Download the binary and uncompress it using the following commands:
wget https://downloads.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz gzip -d kafka_2.13-3.3.1.tgz tar -xf kafka_2.13-3.3.1.tar
We can explore the directory structure with the following commands:
cd kafka_2.13-3.3.1.tar ls -la
Which should output:
drwxr-xr-x 9 benjaminwootton staff 288 13 Oct 12:32 . drwxr-xr-x 9 benjaminwootton staff 288 13 Oct 12:32 .. -rw-r--r-- 1 benjaminwootton staff 14842 29 Sep 20:03 LICENSE -rw-r--r-- 1 benjaminwootton staff 28184 29 Sep 20:03 NOTICE drwxr-xr-x 41 benjaminwootton staff 1312 13 Oct 12:32 bin drwxr-xr-x 18 benjaminwootton staff 576 13 Oct 12:32 config drwxr-xr-x 104 benjaminwootton staff 3328 13 Oct 12:32 libs drwxr-xr-x 13 benjaminwootton staff 416 13 Oct 12:32 licenses drwxr-xr-x 3 benjaminwootton staff 96 13 Oct 12:32 site-docs
Kafka has historically had a significant dependence on a second piece of software named Apache Zookeeper for it's state and cluster management. The Kafka project has been actively working for some time on removing this dependency, but for now, Kafka still requires Zookeeper.
Therefore, before bringing up our Kafka cluster, we need to begin by starting Zookeeper. This is distributed with the Kafka download and started from the bin folder, using the default zookeeper.properties file which is also distributed with Kafka:
Running this script should start Zookeeper relatively quickly with no obvious errors in the logs.
With Zookeeper up and healthy, we can now move towards starting Kafka. This requires a server.properties file, of which as with Zookeeper, the default one is fine for our purposes.
Because Zookeeper is running in our first terminal, we need to open a second terminal and issue the following command:
cd kafka_2.13-3.3.1 ./bin/kafka-server-start.sh ./config/server.properties
Again, this should start quickly with no obvious errors in the logs.
We now have a running Zookeeper and a running Kafka broker connected to it.
If we needed to shut down, the cleanest way to do this is to close down Kafka and then Zookeeper, in reverse order to which we started the processes.
Because we started the Zookeeper and Kafka servers in the foreground of our terminal, the processes will be closed when we press CTRL+C.
However, in more realistic deployments we would start Kafka broker using some mechanism that continues to run after we close the terminal window - for instance nohup, within tmux, or as a service. In that instance, a seperate script is provided to manually stop the Zookeeper and Kafka servers:
Feel free to start and stop your processes through various mechanisms, but ensure they are running again before progressing with the lesson.
The Kafka distribution that we downloaded comes with a number of scripts which are found in the ./bin folder.
cd kafka_2.13-3.3.1 ls -la bin/
-rwxr-xr-x 1 benjaminwootton wheel 1423 8 Sep 22:21 connect-distributed.sh -rwxr-xr-x 1 benjaminwootton wheel 1396 8 Sep 22:21 connect-mirror-maker.sh -rwxr-xr-x 1 benjaminwootton wheel 1420 8 Sep 22:21 connect-standalone.sh -rwxr-xr-x 1 benjaminwootton wheel 861 8 Sep 22:21 kafka-acls.sh -rwxr-xr-x 1 benjaminwootton wheel 873 8 Sep 22:21 kafka-broker-api-versions.sh -rwxr-xr-x 1 benjaminwootton wheel 860 8 Sep 22:21 kafka-cluster.sh -rwxr-xr-x 1 benjaminwootton wheel 864 8 Sep 22:21 kafka-configs.sh -rwxr-xr-x 1 benjaminwootton wheel 945 8 Sep 22:21 kafka-console-consumer.sh -rwxr-xr-x 1 benjaminwootton wheel 944 8 Sep 22:21 kafka-console-producer.sh -rwxr-xr-x 1 benjaminwootton wheel 871 8 Sep 22:21 kafka-consumer-groups.sh -rwxr-xr-x 1 benjaminwootton wheel 948 8 Sep 22:21 kafka-consumer-perf-test.sh -rwxr-xr-x 1 benjaminwootton wheel 871 8 Sep 22:21 kafka-delegation-tokens.sh -rwxr-xr-x 1 benjaminwootton wheel 869 8 Sep 22:21 kafka-delete-records.sh -rwxr-xr-x 1 benjaminwootton wheel 866 8 Sep 22:21 kafka-dump-log.sh -rwxr-xr-x 1 benjaminwootton wheel 863 8 Sep 22:21 kafka-features.sh -rwxr-xr-x 1 benjaminwootton wheel 865 8 Sep 22:21 kafka-get-offsets.sh -rwxr-xr-x 1 benjaminwootton wheel 870 8 Sep 22:21 kafka-leader-election.sh -rwxr-xr-x 1 benjaminwootton wheel 863 8 Sep 22:21 kafka-log-dirs.sh -rwxr-xr-x 1 benjaminwootton wheel 873 8 Sep 22:21 kafka-metadata-shell.sh -rwxr-xr-x 1 benjaminwootton wheel 862 8 Sep 22:21 kafka-mirror-maker.sh -rwxr-xr-x 1 benjaminwootton wheel 959 8 Sep 22:21 kafka-producer-perf-test.sh -rwxr-xr-x 1 benjaminwootton wheel 874 8 Sep 22:21 kafka-reassign-partitions.sh -rwxr-xr-x 1 benjaminwootton wheel 874 8 Sep 22:21 kafka-replica-verification.sh -rwxr-xr-x 1 benjaminwootton wheel 10587 8 Sep 22:21 kafka-run-class.sh -rwxr-xr-x 1 benjaminwootton wheel 1376 8 Sep 22:21 kafka-server-start.sh -rwxr-xr-x 1 benjaminwootton wheel 1361 8 Sep 22:21 kafka-server-stop.sh -rwxr-xr-x 1 benjaminwootton wheel 860 8 Sep 22:21 kafka-storage.sh -rwxr-xr-x 1 benjaminwootton wheel 945 8 Sep 22:21 kafka-streams-application-reset.sh -rwxr-xr-x 1 benjaminwootton wheel 863 8 Sep 22:21 kafka-topics.sh -rwxr-xr-x 1 benjaminwootton wheel 879 8 Sep 22:21 kafka-transactions.sh -rwxr-xr-x 1 benjaminwootton wheel 958 8 Sep 22:21 kafka-verifiable-consumer.sh -rwxr-xr-x 1 benjaminwootton wheel 958 8 Sep 22:21 kafka-verifiable-producer.sh -rwxr-xr-x 1 benjaminwootton wheel 1714 8 Sep 22:21 trogdor.sh drwxr-xr-x 30 benjaminwootton wheel 960 8 Sep 22:21 windows -rwxr-xr-x 1 benjaminwootton wheel 867 8 Sep 22:21 zookeeper-security-migration.sh -rwxr-xr-x 1 benjaminwootton wheel 1393 8 Sep 22:21 zookeeper-server-start.sh -rwxr-xr-x 1 benjaminwootton wheel 1366 8 Sep 22:21 zookeeper-server-stop.sh -rwxr-xr-x 1 benjaminwootton wheel 1019 8 Sep 22:21 zookeeper-shell.sh
These administration scripts can be used for tasks such as creating topics, removing messages, publishing and consuming messages and repartitioning topics. By default, there is no graphical user interface or administration tool for managing Kafka. The scripts are the primary route.
We will cover these scripts in more detail in the proceeding lessons.
As Kafka is working and exchanging messages between producers and consumers, it needs to store state such as the messages waiting to be consumed and details about the producers and consumers. Though this state will be heavily cached in memory for performance purposes, they will also need to be written to disk in case the Kafka process was to crash.
By default, this data is stored in the /tmp/kafka-logs directory.
ls -la /tmp/kafka-logs
drwxr-xr-x 6 benjaminwootton wheel 192 12 Nov 13:34 __consumer_offsets-0 drwxr-xr-x 6 benjaminwootton wheel 192 12 Nov 13:34 __consumer_offsets-1 -rw-r--r-- 1 benjaminwootton wheel 0 12 Nov 13:32 cleaner-offset-checkpoint -rw-r--r-- 1 benjaminwootton wheel 4 14 Nov 14:26 log-start-offset-checkpoint -rw-r--r-- 1 benjaminwootton wheel 88 12 Nov 13:32 meta.properties drwxr-xr-x 6 benjaminwootton wheel 192 12 Nov 13:34 new_pizza_orders-0 -rw-r--r-- 1 benjaminwootton wheel 1216 14 Nov 14:26 recovery-point-offset-checkpoint -rw-r--r-- 1 benjaminwootton wheel 1216 14 Nov 14:26 replication-offset-checkpoint
This state store is managed internally by Kafka, so you wouldn't typically have to interact with it. However, it's important to know where it is stored for a few reasons:
- Some operating systems have processes which automatically clear files from /tmp. You may wish to relocate the Kafka state file in this instance;
- You may wish to add backups and monitoring around this directory to improve stability and chance of recovery in a critical situation;
- If you would like to start afresh with your Kafka broker, stopping the broker and removing all of the state is the easiest way to do this.
In the Next Lesson we will go through the process of administering topics which will eventually hold our messages and events.