Lesson Overview

In this lesson we will:

  • Introduce stateless and stateful transformations;
  • Illustrate why stateful transformations are much harder to achieve.

Analytics Of Event Streams

Imagine a customer lifecycle which is comprised of a series of events such as the following:

  • A customer visits your website and browses some products;
  • They download some information about a product;
  • A few days later, they come back and place an order;
  • The order is packed and dispatched;
  • A few days later, the customer logs onto the site and leaves a negative review.

There are various things we might wish to do here to the event stream to filter, modify, analyse and respond to it. For instance, maybe there is the business requirement to manually review all orders over a certain value being dispatched outside of the UK.

Stateless Transformations

The first class of transformations we would like to do are referred to as stateless, because they don't require any history or memory in order to action.

For instance, filtering if the order value is greater than a certain number, or reformatting an Order ID are stateless operations because they happen on a message by message basis, with no reference to the past history.

Statless can happen quickly and can be scaled across many servers for inherent parallelism.

Stateful Transformations

The second class of changes are stateful. An example of a stateless transformation might be the requirement to see if the same customer has placed high value 3 orders in the last 24 hours, or to aggregate the total for all of the orders dispatched today.

Stateful transformations are much harder to implement, because they require memory of the event stream, require access across different streams, and we need to ensure that stream processors have access to the right data at the right time. In the above example, the stream processor needs to have access to at least 24 hours of order and customer data in order to keep up the running total.

When the data is no longer needed, the data should be discarded to prevent running out of memory.

Complexity Of Stateful Transformations

Paralellising stateful operations is also more complex, because we may be dealing with timing issues such as different processors seeing more up to date data than others. This makes co-ordination and order of magnitude more complex.

Stateful computations are powerful, and are where the untapped opportunities lie for companies to differentiate their businesses. Sadly, stateful stream processing is also where stream processing becomes complicated.

Next Lesson

Description of next lesson here

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Stream Processing vs the Data Warehouse

Prev Lesson:

Introduction To Streaming Data

© 2022 Timeflow Academy.