Lesson Overview

In this lesson we will:

  • Learn about Streaming Data;
  • Introduce Event Driven Architectures;
  • Compare The Traditional Batch approach with Streaming Approaches To Data Management.

About Streaming

Many companies are looking to use their data more effectively in order to improve their customer experience and the efficiency of their business.

Speed is an important part of this. The earlier you can respond to incoming data, the more opportunities you have to improve the customer experience.

Recognising this, businesses are implementing Streaming Data architectures and platforms. These allow them to process and integrate data in real time, analyse it whilst in flight, and automate actions and workflows in response to situations automatically as they arise.

Moving to Streaming and Event Driven Architectures is a fairly fundamental change for businesses and their technology landscape, but one which is increasingly valuable in meeting customer expectations in todays digital world.

From Batch To Streaming

Currently, most business intelligence and data processing within businesses is batch based, where periodically (e.g. daily) we exchange and process large volumes of data in batches of multiple records.

Though Batch based data exchange is simpler to implement, it is also slower.

Event Driven Architectures

Fundamentally, businesses are looking to speed this up in order to use data for operational purposes, to reduce time to insight, or to improve the customer experience.

A key building block for doing this is event driven architecture and event based data handling, where we process events immediately after they happen in order to derive a response.

The challenge is that these businesses are generally still tied to legacy systems, or systems built on more traditional technology such as relational databases or data lakes. Our task therefore is to generate real time events from technologies which are not really oriented in that manner.

From Batch To events

Fortunately, there are two key open source technologies we have found to be highly successful in this regard, which any team looking to evolve towards event based architecture should investigate.

The first is Debezium. Debezium allows you to Stream changes from your databases such as MySQL or Postgres using a technique called "Change Data Capture" or CDC. As records are inserted, updated and deleted, events can be created and pushed to a destination such as Kafka where they can be processed in a decoupled manner. This completely avoids changes to the legacy application and should not affect the performance or stability of the application. It's therefore a very quick win to mvong towards event orientation.

The second is FluentD. FluentD enables us to data log sources such as log files, SysLog or application runtimes, and push to a destination such as Kafka as they are created. Again, we can hopefully evolve towards events with minimal if any application changes via this manner.

Though this problem can be solved in various ways, these open source technologies have proven to be performance and extremely stable under high transaction volumes. We think these technologies will win out be a key part of the event driven journey within enterprise.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Stateful and Stateless Event Processing

Prev Lesson:

From Spark To Databricks

© 2022 Timeflow Academy.