Data Pipelines in Production - Bad vs Best Practices

Bhavani Ravi (~bhavaniravi) | 22 May, 2023

9

Votes

Description:

Pipelines are an inevitable part of the Data ecosystem. There are enough tools on the market to implement pipelines. However, as with any technology, there are good and bad ways to use it.

This talk will explore the bad and best practices when deploying Data pipelines to a production environment. From common pitfalls, such as misconfigured tasks and lack of scalability, to best practices, such as robust monitoring and proper security measures, This talk will provide practical advice for anyone looking to implement Data pipelines in their production, with Airflow as an Example.

Pre-Requisties

To understand the nuances of the talk, we assume you work with data tools and are aware of Production needs.

What's covered?

We will start with a basic data pipelining example, where a developer starts looking for a tool to automate things. The complexity grows from there to something so huge the team starts firefighting There is a better way.

Starting with the best practices. What checkpoints and assumptions must you have before making things live? Move fast and break things, but with caution.

Prerequisites:

Data engineering and data pipelining

Content URLs:

The talk is based on my blog post https://dataanddevops.com/apache-airflow-bad-vs-best-practices-in-production-2023

Slides https://docs.google.com/presentation/d/1zbqIs5zDIxfLY7_Rre2TQE9XwL23kdkFZHz5Tz7MtQw/edit?usp=sharing

Speaker Info:

Bhavani Ravi is an Independent software Engineer who has been in the Python ecosystem for 7 years. She has contributed to Opensource libraries like Pandas and Airflow.

Speaker Links:

https://www.youtube.com/watch?v=gjUzPGKSuDM
https://www.bhavaniravi.com/about-me/talks
https://www.bhavaniravi.com/about-me/opensource-contributions

Section:	Data Science, AI & ML
Type:	Talks
Target Audience:	Advanced
Last Updated:	12 Sep, 2023

Comments