Lesson Overview

In this lesson we will:

  • Learn about Kafka partitions;
  • Learn how to create and administer partitions;
  • Learn about event ordering and how it relates to partitions;
  • Learn about partitions keys
  • Use the kafka-console-producer script.

What Are Kafka Partitions?

Partitions are one of the main features of Kafka for achieving parrelism and therefore performance.

A partition is a sub-division of a topic. For instance, our new_pizza_orders topic could be divided into 20 partitions numbered 0-19.

Partitions can also be spread across your broker cluster. For instance, on a 5 server broker cluster, each node could process 4 of the 20 partitions.

They key thing is that wherever they reside, each of these partions can be written to by producers and read from by consumers in parallel allowing us to significantly increase throughput and latency of our Kafka broker and cluster.

Creating Partitions

As already seen, when we explicitly create a topic with kafka-topics.sh, we need to specify a number of partitions. This time, we can pass a higher number to the partitions flag such as 5:

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders --create --partitions 5 --replication-factor 1

If we now describe the topic:

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic new_pizza_orders  --describe

We can see that we now have 5 partitions numbered 0-4 configured on the topic:

Topic: new_pizza_orders TopicId: 8pj25fZsSFqDip7VvE3EHg PartitionCount: 5       ReplicationFactor: 1    Configs: segment.bytes=1073741824
Topic: new_pizza_orders Partition: 0    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 1    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 2    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 3    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 4    Leader: 0       Replicas: 0     Isr: 0

The leader refers to which broker is managing the partition. In this instance, because we only have a single broker setup on the training virtual machine, all partitions for the topic are managed on broker 0.

The number of partitions can also be adjusted after creation. Let's adjust the number of partitions to 10:

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic new_pizza_orders --partitions 10

Kafka does not support reducing the number of partitions at this time. We can only increase the number with the alter command.

Event Ordering & Partitions

As previously mentioned, the order in which we process events will be important. For instance, it makes no sense to update a customer before the customer has been created in our system.

Using partitions has implication for event ordering.

  • The ordering of events is only guaranteed at a partition level
  • We could therefore read and process events outs of order even if they are on the same topic

The key role is that data which has to be processed in order therefore has to go onto the same partition

Keys And Partition Order

Kafka producers help with this ordering challenge by putting all events with the same key onto the same partition.

For instance, all events for Customer ID 1234 could be routed to the same partition to avoid the situation of updating before creation.

Sometimes we need more complex behaviour than this though:

  • Partitioning by a non keyed variable
  • Perhaps our keys are not evenly distributed so a simple hash function over the key is not suitable

This is adjusted using a feature called the partitioning strategy.

Scenario Setup

We will use command line scripts to demonstrate the behaviour of partitions.

Create A New Topic**

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic new_orders --partitions 5

Create A Console Producer**

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092  --topic new_orders

Create A Console Consumer**

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092  --topic new_orders

Describe Consumer Groups

./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groups

Adjust Number Of Partitions

./bin/kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic new_orders --partitions 10

Summary

In this lesson, we took a deeper look at Kafka partitions, and some of the implications of using multiple partitions with regards to performance and ordering guarantees.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Kafka Performance Test Scripts

Prev Lesson:

Kafka Consumer Groups

© 2022 Timeflow Academy.