Lesson Overview

In this lesson we will:

  • Learn about the Kafka performance test scripts, which can be used to measure, test and optimise the performance of your Kafka cluster.

Kafka Performance Test Scripts

The Kafka performance test scripts allow us to generate and consume high volumes of data through your Kafka cluster in order to measure it's performance characteristics such as throughout and latency.

The tests we run using these scripts can be configured to match your real workloads, including setting the number and frequency of messages, message sizes and the level of reliability you need for each message.

For each test, we can capture outputs including the number of messages and the amount of data transferred including minimum, maximum, average and 99th percentile latencies in order to understand the real world performance characteristics of your cluster.

As well as focussing on latency, the Kafka performance test scripts can also be used for load testing by simulating high volumes of messages to ensure the broker and other processes remain available and within acceptable bounds for performance.

Locating The Scripts

The performance test scripts are distributed as part of the Kafka distribution, and can be found in the ./bin folder of your deployment:

cd ./bin
ls -la kafka*perf*

Outputs:

kafka-producer-perf-test.sh
kafka-consumer-perf-test.sh

These two scripts are used for producing and consuming data respectively. They can be used independently, or be ran in parallel in different terminal windows.

Create A Console Consumer

We will begin the lesson by starting the console consumer script which we will use to visualise our test messages as they are generated. We will specify the server to connect to and a topic for our test, in this case new_orders:

cd ./bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic new_orders

This will begin a console consumer listening to the new_orders topic which we can use to visualise our test.

Performance Test The Producer

Next, we will simulate a series of records at high volume, pushing them into the same new_orders topic using the kafka-producer-perf-test.sh script.

Various command line parameters can be specified to configure the test and control the nature of the messages produced:

  • num-records - The total number of records to generate during the test;
  • throughput - The number of records to send per second;
  • record-size - The size of each record in bytes.

As an example, we can issue the following command to start a simulation of 100 records, each of 100 bytes, and send them at a rate of 10 per second. Try this in a new terminal window so we can monitor the messages arrival in the console consumer that was setup above.

./kafka-producer-perf-test.sh --producer.config config/producer.properties --topic new_orders --num-records 100 --throughput 10 --record-size 100

If everything is working well, you should begin to see messages to arrive at the consumer:

LCTLNPENKLVJFLYLGNNWXWHZZWXLWBCRHZVHISZOFRZPBULSFXKFQNJIGJYBCGFPYTFAERUHAZASRTDEWAUIGWIQPKZPCI
AJOQITXRTLRATEHMSMAQKLZOUIUNVHYKASMUPWDUKASWNGEBWCAYDASZJZLWYBMEXFNYXNHOVYCEXDPNPWYZLNBPJJWYNH
DFMTMBAZWZUSVJQCTSFYUXDQICTIEWDOYTGTZOHVKBFIBXJPBIXDMQYNNGUIVAWCVHCPGTPHKTDQNVKTMTEWERWOBFJVIW
QQEYKLOOGHBBNVHHDNQZTSILOTWIOOOBVYHXFGEAPSVHXENJVCRWIMWKKHAWTOFMCQKKVOJJEHRIZEQSDNFMTJIDKAMQUA
JEJNMKPRSOYASPIJZQEYNPNOJXRAJNSHVPGGYWLQFGKLOEMZDXKFURIMFCNSQMEGEVQJSWEEDAHAMDEOJDCPJKZMVFRAVT
GJTOXOLGCIVVVYLLUUFNSCGYHKPXCFNJGIMSXWAQMUSXMDGQFPJVFSHKMJVSFQCMGMKXCLAUMLQZUBJISKLOTUGZQUTSKZ
GHIASALHWQOGCGJLXULUFZCUAUMYKIDKWFNACQRGSAAYCZGLWHZVPLJYKZKLTPYFGEDPCRILJRWREXFITOIOEVYTDWQZAE
UENSFPCRLYMJFLKJGRIACAFZVLPWMJNAEIXKTRMSQHMPPHGCDLPDULWQDROVIWATICTIEBABTHWPYSTQJOFPLLPOONOYRH
RXNEJEYQTBJHZGTFWGAXZIIKFWFELLHHGBAFCKQYGHUOKGYHPSMJWXSXBFLIHPTQZYTYIUJNWLDTKLBUDEVUOFOCABLMPD
EQCAVVAEUXSJAKFFRPJUJQBZTPFBDZWZUNRSKETJOPJDFKBLZNEVDNWNARUNDXFBVSFITKGIWWAUXMIYKYLWQIZSUJIQXW
LDLDPXDCPGKPULAOSFGUQUEUQBMPUJLDNZQQRRITNFFNLZOLWSVZTQVMLRYIMJJVYADMBRQSZHYHJXOGQSPUVEVVDINQVI
NSLZWPXJGNUQHPENCNBFOHXCADCWBWBEZSWUTQXZWULQJSSUQGTZNHMRBVMHTPIUZWQUXRMDJNSSPDEWJFPZXOMGFVTCYK

These are random 100 byte records being created by thekafka-producer-perf-test.sh script. Your specific messages will of course look different to these random strings.

As the kafka-producer-perf-test.sh script runs, we will see a status of it's performance periodically:

52 records sent, 10.2 records/sec (0.00 MB/sec), 28.1 ms avg latency, 805.0 ms max latency.

This is illustrating the average and the highest latency for each message send.

Interpreting Results

When the test completes, the performance test script will output details regarding performance of the test. In this instance, we can see that we sent the requested 10 messages per second, with each message taking on average 16ms to publish. The slowest message was 805ms and the 50th percentile was 5ms.

100 records sent, 10.085729 records/sec (0.00 MB/sec), 16.76 ms avg latency, 805.00 ms max latency, 5 ms 50th, 55 ms 95th, 805 ms 99th, 805 ms 99.9th.

Performance Test The Consumer

In the previous example, we created a console listener to visualise the test, and tested the producer performance. The next step is to performance test the consuming process.

As our New_Orders topic already has 100 messages from the previous run, we can use these messages for our first consumer test.

When running the consumer performance test, at a minimum we will need to specify the topic name and the number of messages to consume.

./bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic new_orders --messages 100

If succesfull, this will consume the 100 messages, then output some statistics about the consumer process:

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2021-12-02 14:15:16:298, 2021-12-02 14:15:17:437, 0.0095, 0.0084, 100, 87.7963, 980, 159, 0.0600, 628.9308

Here we can see things like the start time and end time of the test, how much data was processed, and how many messages per second were retrieved. In this case, we can see that our producer listening to a single node broker setup on the same machine was capable of consuming 628 messages per second.

## Next Steps

In a real world setting, we could then experiment with various settings for our producer, such as changing the batch size, message compression settings, acknowledgement levels and other settings to see how this impacts the tests. We could also make changes to the configuration of the broker cluster in order to optimise the end to end performance and determine our production configuration.

Using A Payload File

In the example above, the producer performance test script was simply generating random strings of 100 bytes long. To get more realistic tests, we may wish to send real messages, perhaps JSON messages if that is what you will ultimately be using.

These messages can be specified using the payload-file flag passed to the kafka-producer-perf-test.sh script.

./bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic new_orders --payload-file /tmp/kafka-messages.json

Real World Kafka Benchmarking

Running a performance test on a single laptop doesn't tell us much, as we soon begin to hit limits on the machine such as the number of CPU cores.

In a more realistic situation, we would have the client and server running on different hosts, which would give us more compute capacity but introduce network latency. We would also likely have Kafka running as part of a cluster. Kafka performance test scripts become much more useful in these real world deployments.

Summary

In this lesson we looked into performance and load testing the Kafka cluster and Kafka based applications using the kafka-perf-test-consumer and kafka-perf-test-producer scripts.

We demonstrated how data can be generated and consumed in a controlled fashion, and the type of metrics which can be captured to understand it's real world performance.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

What Is Kafka Streams?

Prev Lesson:

Kafka Partitions

© 2022 Timeflow Academy.