Course Overview
Introduction To Streaming Data

Introduction To Stream Processing

Lesson #2

In this lesson we will:

  • Introduce the concept of stream processing;
  • Demonstrate Apache Flink.

Stream Processing

Imagine a business has a stream of events being generated each time an order is being placed.

These events could be published onto a streaming data platform such as Apache Kafka.

{ order_id: 1, order_value : 150, order_category : "food" }
{ order_id: 2, order_value : 250, order_category : "drink" }
{ order_id: 3, order_value : 350, order_category : "homeware" }

There are a number of ways in which we might need to process an event stream such as this.

  • Filtering - Removing events which we do not wish to store and analyse.
  • Transformations - Transforming the events as they pass through the system e.g. capitilising the order category in order to clean it up for subsequent reporting.
  • Analytics - We may wish to analyse the data in flight e.g. calculating the average price of orders in the last hour.

Frameworks

There are a number of development frameworks which make it easier to build stream processors.

Apache Flink is the leading framework in this space. It allows us to process unbounded stream of events.

Kafka Streams is a similar proposition to Flink, but it is more deeply connected with Kafka, and is deployed as a library rather than requiring a cluster. This means it is more lightweight to deploy.

Next Lesson:
02

Key Technologies In Streaming Data

In this lesson we will learn about some of the key technologies, tools and platforms that are being used to process streaming data.

0h 10m



Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved