Course Overview
Introduction To Streaming Data

Introduction To Streaming Data

Lesson #1

In this lesson we will:

  • Introduce the concept of streaming data;
  • Compare the traditional batch data approach with streaming data;
  • Introduce some of the foundational concepts associated with streaming data.

About Streaming Data

Streaming data is data that is generated continuously and in high volumes. Common examples of this class of data include stock price updates, click stream events, logs, and data from IOT devices.

Streaming data is often, but not always machine generated, and usually consists of a high volume of relatively small events that are published immediately after it has been generated at the source.

Though streaming data can be very valuable to businesses, processing and analysing it is challenging. Primarily, this is because the quantity and speed that this data is genearated is beyond the scalability limits of the tools that most data teams use today.

Beyond the volume of data, businesses would often like to use their streaming data in sophisticated ways and in real time, as there is often some commercial or operational benefit to doing so. For instance, fraud detection, algorithmic trading and preventitive maintence are all examples where processing streaming data in real time has business benefit.

Considering the challenges and high demands around working with streaming data, new approaches, tools and platforms are required. Though these are emerging today, the field is still in it's relative infancy.

Evolving From Batch To Streaming

Most data platforms deployed within businesses today are batch based. This means that data is ingested and processed in batches of multiple records, typically on some schedule such as hourly or daily cycles.

Though batch based data exchange is simple, its major downside is that it implies a delay before data is processed and gets into the hands of business users.

Though this is acceptable in many situations, businesses increasingly want to process their data in real time for either operational use cases or to improve their customer experience.

Streaming data is the answer for improving this situation, whereby we move from periodic processing of data in batches, towards continuously processing data as it is generated.

The challenge however is that modernising from a batch based architecture to a streaming platform is not simple. Most large businesses have a significant dependency on legacy data systems which have been designed around batch processing. They will likely have to implement a new generation of tools and infrastructure to succesfully work with streaming data. And unfortunattely, they will not necessarily have the experience of working with streaming data in house.

Organisations undertaking this journey from batch data processing to streaming architectures is likely to be a key theme for data teams in the coming years, and Data Engineers with experience in this field will likely be in high demand.

Next Lesson:
01

Introduction To Stream Processing

In this lesson we will introduce the concept of stream processing and introduce the role that frameworks such as Flink and Kafka Streams play.

0h 15m



Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved