Lesson Overview

In this lesson we will:

  • Learn more about the process of publishing data into the Kafka broker;
  • Learn about considerations such as batching and compression when publishing into Kafka;
  • Use the kafka-console-producer script.

Kafka Producers

Kafka Brokers have two types of clients - producers and consumers.

Kafka Producers are the processes which send data into Kafka topics, generating the source data streams that consumers will later process.

An example producer might be a process at a stock exchange publishing price updates each time the stock value changes, or an eCommerce realted service sending a notification to other microservices each time a new order is placed.

A Kafka producer will typically be embedded in some application written in a language such as Java or Javascript. This applciation will make use of the Kafka client library which handles the connection and interaction with Kafka.

Though these client libraries are supplied with Kafka and make the process much easier, as data engineers we still need to understand the producer processes to build and optimise end to end data pipelines.

Making A Connection

The first thing any producer will need to do is to make a connection to the Kafka server. To do this, we will need the following pieces of information:

  • Bootstrap Server - This is the IP address or hostname of the Kafka server that we wish to connect to.

Topic

As discussed, all Kafka messages are published and consumed onto named topics. Each time a producer sends a message, it needs to indicate which topic it will publish onto.

Keys and Partitions

Batching

The simplest model is for the Kafka producer process to send messages to the broker immediately in batches of one.

Compression

Kafka producers also give us the option to compress data before it is sent. This is a tradeoff in that the compression step takes processing time at the producer, but the amount of data sent is smaller. This means that it takes less storage space and is able to be transported and consumed more efficiently.

Acknowledgements

When the Kafka producer sends a message to the broker, it has a choice about the level of reliability.

We can simply send the message, then assume that it will work properly. Most of the time, this would be fine, but there is a risk that the broker could crash before the message is accepted. The message will then be left.

For more reliability, we can ask the producer to wait for an acknowledgement that the message has been fully accepted and committed.

The kafka-console-producer script

The Kafka broker includes a script for producing ad-hoc messages from the terminal. This is useful for debugging and testing purposes.

The script can be executed like so, specifying a topic name to publish on:

cd ./bin/
./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic new_pizza_orders

We can then simply enter our messages directly into the console to publish them onto the topic:

ABC
123
{ "key" : "value" }

Summary

In this lesson we took a deeper look into the process and considerations when publishing data into Kafka.

Next Lesson

In the next lesson, we will look at performance testing Kafka by sending and consuming high volumes of data.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Consuming Data From Kafka

Prev Lesson:

Administering Kafka Topics

© 2022 Timeflow Academy.