Lesson Overview

In this lesson, we will introduce some of the core concepts and terminology associated with Kafka.

Key Concepts

Though we will cover many of these concepts in more detail throughout the course, it is worth learning some of the key concepts and terminology at this stage to begin understanding Kafka.

Broker

A single Kafka server process is referred to as a Broker. It is the responsibility of the broker to accept messages from producers and distribute them to to interested consumers in a performant and reliable manner.

Broker Cluster

Though it is possible to have a single Kafka broker doing all of the work, this could be risky in a production environment in case the process or the server were to crash. It may also lead to scalability or performance issues in a big data or low latency situation.

To combat this, brokers are often arranged into a cluster which works together in a co-ordinated way. As well as giving us more resilience in the case of a single broker loss, it also adds throughput and capacity.

Producers and Consumers

Producers are the processes sending messages to the Kafka broker, and Consumers are the processes receiving messages from the broker. It is possible to have many thousands of consumers and producers interacting with the broker cluster at any one time if necessary.

Messages

A broker or broker cluster is responsible for accepting messages from the producers and delivering them to the interested consumers.

Kafka messages are comprised of a key and a value. Aside from this, Kafka places limited requirements on the actual format of both the key and the value. They could be Strings, JSON, XML or some binary format. The examples below for instance are all valid messages.

1 : { "order_number" : 1, "order_category" : "Electronics" }
1 : 1/Elecrtronics
!@££$ : !£EADADAR£!£RADDASDASDASDASDASD
<my_key/> : </my_value>

Topics

All of the messages that are sent on a Kafka broker are sent to a specific topic. A topic has a name, which could be something such as Orders, Website_Visits, or Prices, describing the data within the topic. Topics can be created statically by the Kafka administrator, or also created more dynamically and openly by producers and consumers as they send and receive their messages.

Partitions

In order to provide improved throughput and performance, topics are further sub-divided into partitions which can be written to and read from in parallel. Your Website_Visits topic could be for instance further sub-divided into 8 partitions. The partition concept further improves the scalability and throughput of your Kafka cluster.

Summary

With Kafka, it's relatively easy for a developer to create a broker and start streaming messages between producers and consumers.

For data engineers, it is important to understanding the key concepts in order to effectively build and administer it.

In the next section we will move forward to actually setting up a broker on our training virtual machine.

Prev LessonNext Lesson

© 2022 Timeflow Academy.