Lesson Overview

In this lesson we will:

  • Introduce the key concepts and terminology associated with Kafka.

Core Concepts Of Kafka

Though we will cover many of these concepts in more detail throughout the course, it is worth learning some of the key ideas behind Kafka at this stage. Kafka does introduce some new terminology and works in a slightly different way to previous generation of messaging technology.

Broker

A single Kafka server process is referred to as a Broker. It is the responsibility of the broker to accept messages from producers and distribute them to to interested consumers in a performant and reliable manner.

Broker Cluster

Though it is possible to have a single Kafka broker doing all of the work, this could be risky in a production environment in case the process or the server were to crash. It may also lead to scalability or performance issues in a big data or low latency situation.

To combat this, brokers are often arranged into a cluster which work together in a co-ordinated way to distribute messages from publishers to consumers in the most efficient and reliable way.

Using a cluster adds more resilience, for instance if one of the individual brokers crash, as well as higher throughput and lower latency due to the increased capacity.

Producers and Consumers

Producers are the processes sending messages to the Kafka broker, and Consumers are the processes receiving messages from the broker.

It is possible to have many thousands of consumers and producers interacting with the broker cluster at any one time if necessary.

Kafka will allow for scenarios such as a consumer that temporarily goes offline and needs to continue where it left off, or consumers working in a group to process a single stream of messages. The aim is to offer exactly once processing where no messages are lost or processed twice.

Messages or Events

A broker or broker cluster is responsible for accepting messages from the producers and delivering them to the subscribed consumers.

Kafka messages are comprised of a key and a value. Aside from this, Kafka places very few requirements on the actual format of both the key and the value. They could be Strings, JSON, XML or some binary format. The examples below for instance are all valid messages from a Kafka perspective.

1 : { "order_number" : 1, "order_category" : "Electronics" }
1 : 1/Elecrtronics
!@££$ : !£EADADAR£!£RADDASDASDASDASDASD
<my_key/> : </my_value>

Messages are sometimes referred to as Events, with the two terms

Topics

All of the messages that are sent on a Kafka broker are sent to a specific topic. A topic has a name, which could be something such as Orders, WebsiteVisits, or Prices, describing the data within the topic.

Topics can be created statically by the Kafka administrator, or also created more dynamically and openly by producers and consumers as they send and receive their messages.

Partitions

In order to provide improved throughput and performance, topics are further sub-divided into partitions which can be written to and read from in parallel. The WebsiteVisits topic could for instance be further sub-divided into 8 partitions. The partition concept further improves the scalability and throughput of your Kafka cluster.

Event Streaming

Kafka is sometimes referred to as an Event Streaming platform. This is because events are sent continuously from source to destination, often immediately as the data is created. This is in comparison to batch data exchange which has historically been used for data exchange. Please see our lesson on Streaming or this blog post for further details.

Next Lesson

In the Next Lesson we will move forward to actually setting up a Kafka broker, initially in standalone mode.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Setting Up Your Kafka Broker

Prev Lesson:

Introduction To Kafka

© 2022 Timeflow Academy.