Course Overview
Introduction To Streaming Data

Challenges Associated With Streaming Data

Lesson #4

In this lesson we will:

  • Discuss the challenges in building streaming solutions;

Why Is Working With Streaming Data Difficult?

As we discussed in the previous lesson, moving from the traditional batch approach towards a real time data streaming architecture is a challenging undertaking.

In this lesson, we will explain in more detail what these challenges are. The streaming technologies that we discuss in this course can be complex, and it is important to understand the problems that we are trying to solve with them.


Streaming platforms need to process and analyse high volumes of event data. Though one stream of events could have a high volume of events, there are likely to be multiple streams all generating data in parralel. An enterprise stream processing platform therefore is likely to need a very high degree of scalability to handle the volumes of data in flight and at rest.


The volume of events in the stream can scale up and down in terms of volume, and may spike during peak hours. Streaming platforms therefore need a capability to scale up and down dynamically to accomodate these changing workloads.


In streaming scenarios, businesses often have some benefit to responding to their event streams in real time. We therefore need to ingest, process and respond to the streams of events with low latency in order to extract maximum value from the data.

Exactly Once Processing

When working with event streams it is important to never lose a message, and never double send or double process a message. We therefore need to build solutions which have a high degree of reliability in how messages are processed, even if some component in the stack was to fail.

Stateful Processing

It is relatively simple to develop stateless processors which do things such as filter out, route, or add detail to events. However, the complexity grows when we want to look for historical patterns such as “3 failed credit card transactions in the last hour.” To do this, we need to process events by considering their past state, which adds significant complexity into the stack.

Time Semantics

The notion of time becomes complex in event processing. Do we care about the time the event happened, the time it was received by the processor, or the time it was stored in the database? In most scenarios, event time is the natural choice, but then we need correct semantics to ensure that we are using the state of the world at the time in question when we come to process the event.


It is important to maintain complete security around personally identifiable and commercially sensitive data. We need to encrypt all stored data in flight and at rest as it moves through the various message queues and processors. This repeated encryption and decryption has impacts on latency and operationally managing the system.


If we needed to implement stream processing from scratch, this would be a very complex undertaking. Fortunately, many tools and platforms that are suitable for stream processing are becoming released and adopted by data teams. These will be discussed in greater detail in the next lesson.

Next Lesson:

Building A Stream Processing Platform

In this lesson we will discuss the considerations when building a stream processing capability;

0h 15m

Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved