In this lesson we will:
- Introduce the concept of stream processing;
- Demonstrate Apache Flink.
Stream Processing
Imagine a business has a stream of events being generated each time an order is being placed.
These events could be published onto a streaming data platform such as Apache Kafka.
{ order_id: 1, order_value : 150, order_category : "food" }
{ order_id: 2, order_value : 250, order_category : "drink" }
{ order_id: 3, order_value : 350, order_category : "homeware" }
There are a number of ways in which we might need to process an event stream such as this.
- Filtering - Removing events which we do not wish to store and analyse.
- Transformations - Transforming the events as they pass through the system e.g. capitilising the order category in order to clean it up for subsequent reporting.
- Analytics - We may wish to analyse the data in flight e.g. calculating the average price of orders in the last hour.
Frameworks
There are a number of development frameworks which make it easier to build stream processors.
Apache Flink is the leading framework in this space. It allows us to process unbounded stream of events.
Kafka Streams is a similar proposition to Flink, but it is more deeply connected with Kafka, and is deployed as a library rather than requiring a cluster. This means it is more lightweight to deploy.