In this lesson we will:

  • Introduce the key concepts and terminology associated with Kafka.

Core Concepts Of Kafka

Though we will cover many of these concepts in more detail throughout the course, it is worth learning some of the key ideas behind Kafka at this stage. Kafka does introduce some new terminology and works in a slightly different way to previous generation of messaging technology.


A single Kafka server process is referred to as a Broker. It is the responsibility of the broker to accept messages from producers and distribute them to to interested consumers in a performant and reliable manner.

Broker Cluster

Though it is possible to run with a single Kafka broker, this could be risky in a production environment in case the process or the server were to crash. A single broker may also lead to scalability or performance issues in a big data or low latency environment.

To combat this, brokers are often deployed as a cluster of multiple brokers which work together in a co-ordinated way. This adds resilience, for instance if one of the individual brokers crash, as well as higher throughput and lower latency due to the increased capacity.

Producers and Consumers

Producers are the processes sending messages to the Kafka broker, and Consumers are the processes receiving messages from the broker.

It is possible to have many thousands of consumers and producers interacting with the broker cluster at any one time if necessary.

Kafka will allow for scenarios such as a consumer that temporarily goes offline and needs to continue where it left off, or consumers working in a group to process a single stream of messages. The aim is to offer exactly once processing where no messages are lost or processed twice.

Messages or Events

A broker or broker cluster is responsible for accepting messages from the producers and delivering them to the subscribed consumers.

Kafka messages are comprised of a key and a value. Aside from this, Kafka places very few requirements on the actual format of both the key and the value. They could be Strings, JSON, XML or some binary format. The examples below for instance are all valid messages from a Kafka perspective.

1 : { "order_number" : 1, "order_category" : "Electronics" }
1 : 1/Elecrtronics
[email protected]££$ : !£EADADAR£!£RADDASDASDASDASDASD
<my_key/> : </my_value>

Messages are sometimes referred to as Events, with the two terms being used interchangeably.


All of the messages that are sent on a Kafka broker are sent to a specific topic. A topic has a name, which could be something such as Orders, WebsiteVisits, or Prices, describing the data within the topic.

Topics can be created statically by the Kafka administrator, or also created more dynamically and openly by producers and consumers as they send and receive their messages.

Retention Period

Kafka topics are configured with a retention period, which is the amount of time that they are kept in the topic before being deleted.

By default, topics are created for a period of 7 days, though there may be instances where we can dramatically shorten this, for instance where the data quickly ceases to be useful, or lengthen it, for instance where we need to retain message history for audit and compliance purposes.


In order to provide improved throughput and performance, topics are further sub-divided into partitions which can be written to and read from in parallel.

A WebsiteVisits topic could for instance be further sub-divided into 8 partitions, and Kafa will allow us to read and write to these in paralell, more efficiently making use of the servers CPU and storage to optimise throughput.

Partitioning is therefore a key tool in improving the scalability and throughput of your Kafka cluster.

Event Streaming vs Batch

Kafka is sometimes referred to as an Event Streaming platform. This is because events are sent continuously from source to destination, often immediately as the data is created.

This is in comparison to infrequent batch processing has historically been used for data exchange. Please see our course on Streaming Data or this blog post for further details.

Next Lesson:

Setting Up Your Kafka Broker

In this lesson we will set up a Kafka broker in standalone mode.

0h 15m

Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved