Course Overview
Kafka For Data Engineers

Kafka Consumer Groups

Lesson #7

In this lesson we will:

  • Learn about Kafka consumer groups;
  • Understand how consumer groups work together to co-ordinate reliable message delivery;
  • Use the kafka-consumer-groups script to analyse consumer groups.

About Consumer Groups

Kafka messages are produced by producers and consumed by consumers.

In many instances, it makes sense to group our consumers into logical groupings depending on how we wish to divide the work.

For instance, we might have a group of consumers which are together responsible for consuming New Order messages. We may wish that one and only one member of the group consumers the New Order messages.

This also allows us to more finely control load through the system. We could for instance have a consumer group dedicated to consuming New Orders, and a consumer group dedicated to consuming Price Updates, meaning that all of the messages are processed in a predictable way.

Consumer Groups And Partitions

Consumer groups have a tight relationship with the number of partitions from a correctness and performance perspective.

Imagine we have a topic with 10 partitions:

  • If we have 10 consumers in a group we are balanced, with each consumer servicing a different partition.
  • If we have more than 10 consumers in a group, some will sit idle.
  • If we have less than 10 consumers in a group, some consumers will process from more than one partition.

We don't necessarily need to be "balanced". This depends on the nature of the data and the requirements for failover and performance.

The Kafka Consumer groups script allows us to view information about the consumer groups that are currently interacting with the broker instance.

./bin/ --bootstrap-server localhost:9092


Let's use the script to subscribe but using a consumer group.

./bin/ --group-name pizzq_prorcessor

Let's use the script to subscribe but using a consumer group.

./bin/ --group-name pizzq_prorcessor
Next Lesson:

Kafka Partitions

In this lesson we will learn about Kafka partitions and how they help to improve parallelism and therefore performance of your Kafka deployment.

0h 15m

Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved