In this lesson we will:
- Introduce DBT Cloud.
Problems With Traditional ETL
Traditional ETL has always been fairly clunky and inefficient and has been an overdue a modernisation of approach. Some of the most common problems we find with it are:
- The schema within data warehouses is often strongly defined and controlled. The emphasis of ETL was therefore on getting data into the warehouse in the correct “one true” format, putting the burden on the people loading the data and making the process of getting data into the warehouse slow and fragile.
- This warehouse and the ETL processes would usually be managed by centralised data teams. These teams would be a fairly siloed bottleneck, always behind with the needs of the business for integrating and transforming the data.
- The ETL stacks and scripts would often be fragile, error prone, and difficult and slow to change.
- The tools providing ETL would often be GUI based and proprietary. Not only would they be expensive to license, they would also require specialist skills. This meant that neither the producers or consumers of the data would have access to the ETL scripts or the ability to make changes to them.
- Bringing ETL into anything which defines a software development lifecycle was tricky. For instance, the ETL process was always identified as being difficult to source control, version and test. Implementing the concept of development, test and production environments with accurate data management was also way behind the state of the art in the software development world.