Lesson Overview

In this lesson we will:

  • Introduce Kafka Topics;
  • Learn how to create, list and delete topics;
  • Learn more about the message retention settings for Kafka topics.

About Topics

Messages that flow through Kafka are organised, published to, and consumed from topics. For instance, an eCommerce business might create seperate topics for events relating to their orders, their website visits, their price updates and so forth.

Topics have a name but do not have a type. An orders topic could theoretically contain events about website visits or price updates, though this would of course be confusing.

As we have discussed in a previous lesson, Kafka imposes no structure on messages, meaning that the messages within a topic could take any format such as binary objects, Strings, JSON or XML as required by your producers and consumers.

Topic Structure

Topics contain ordered lists of messages, with each message having a key and a value as previously discussed.

In the following sequence, we have sent three messages in JSON format to a topic. The messages have an key of 1, 2 and 3 respectively, and a value in JSON format.

1 : { "ordernumber" : 1, "order_category" : "Electronics" }
2 : { "ordernumber" : 2, "order_category" : "Homeware" }
3 : { "ordernumber" : 3, "order_category" : "Food" }

Each message on the topic has an associated offset, which is a numeric number describing it's position in the topic. Offsets will typically start at 0, and then increment as new messages are placed into the topic. Offsets can then be used by consumers, which could for instance request messages from the first offset, the last offset, or offset 100. If a consumer was to crash during processing, Kafka would know to resume message consumption from the last processed offset when it comes back online.

Messages published to topics are immutable, meaning that the messages once published cannot be updated or deleted by producers after they have been written to the topic. Kafka Administrators do however have the ability to delete messages.

Creating Topics

Out of the box, Kafka Topics can be created automatically the first time a consumer sends a message. For instance, if we start a producer and send events to topics X, Y and Z the three topics would be automatically created the Kafka broker is started with it's default settings.

This behaviour can be turned off with the parameter auto.create.topics.enable within the server.properties file that is specified when you start the Kafka broker if you prefer to have a Kafka administrator statically create topics.

Manually Creating A Topic

Whichever option you use for auto.create.topics.enable, the kafka-topics.sh script can be used to explicitly create a topic from the command line tools.

When creating a topic, we have to specify the number of partitions and the replication factor. These will be explained in more detail later, so simply specify 1 for now:

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders --create --partitions 1 --replication-factor 1

Listing Topics

We can now list the topics on the Kafka broker like so:

./bin/kafka-topics.sh --bootstrap-server --list

There should only be one topic created at this stage, the one we created above named new_pizza_orders.

new_pizza_orders

Describing A Topic

It is possible to describe a topic to get more details about it. Let's describe the topic which we have just created.

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders  --describe

Outputs:

Topic: new_pizza_orders TopicId: eKmjEej3QkiMgAMLKwQ2pA PartitionCount: 1       ReplicationFactor: 1    Configs: segment.bytes=1073741824
        Topic: new_pizza_orders Partition: 0    Leader: 0       Replicas: 0     Isr: 0

This output shows that there is 1 partition and a replica count of 1 as requested, together with other information such as the lead broker. Because we only have one broker, this will show broker ID zero as the owner.

Topic Deletion

It is also possible to use the kafka-topics.sh script to explicitly remove topics if they are no longer needed.

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders  --delete

Retention

By default, Kafka will retain messages for a period of 7 days, which may or may not be appropriate for your needs. Typically, we would like to keep data around until we can be sure that it is fully processed by consumers.

The default setting is configured in the server.properties file which will be used for all topics where a retention period is not specifically specified:

grep retention ./config/server.properties

Outputs:

log.retention.hours=168

When we create topics, we can override this default value by specifying either a retention period in milliseconds, seconds or hours, or a maximum amount of data to retain. Here we are creating 3 topics with a retention time of 180000 milliseconds, or XX days.

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders --create --partitions 1 --replication-factor 1 --retention.ms=180000
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_ingredients --create --partitions 1 --replication-factor 1 --retention.ms=180000
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_events --create --partitions 1 --replication-factor 1 --retention.ms=180000

Next Lesson

In the Next Lesson, we will look more deeply into the process of publishing data into Kafka.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Publishing Data To Kafka

Prev Lesson:

Setting Up Your Kafka Broker

© 2022 Timeflow Academy.