Data Pipelines in Production - Bad vs Best Practices
Bhavani Ravi (~bhavaniravi) |
Description:
Pipelines are an inevitable part of the Data ecosystem. There are enough tools on the market to implement pipelines. However, as with any technology, there are good and bad ways to use it.
This talk will explore the bad and best practices when deploying Data pipelines to a production environment. From common pitfalls, such as misconfigured tasks and lack of scalability, to best practices, such as robust monitoring and proper security measures, This talk will provide practical advice for anyone looking to implement Data pipelines in their production, with Airflow as an Example.
Pre-Requisties
To understand the nuances of the talk, we assume you work with data tools and are aware of Production needs.
What's covered?
We will start with a basic data pipelining example, where a developer starts looking for a tool to automate things. The complexity grows from there to something so huge the team starts firefighting There is a better way.
Starting with the best practices. What checkpoints and assumptions must you have before making things live? Move fast and break things, but with caution.
Prerequisites:
Data engineering and data pipelining
Content URLs:
The talk is based on my blog post https://dataanddevops.com/apache-airflow-bad-vs-best-practices-in-production-2023
Slides https://docs.google.com/presentation/d/1zbqIs5zDIxfLY7_Rre2TQE9XwL23kdkFZHz5Tz7MtQw/edit?usp=sharing
Speaker Info:
Bhavani Ravi is an Independent software Engineer who has been in the Python ecosystem for 7 years. She has contributed to Opensource libraries like Pandas and Airflow.
Speaker Links:
- https://www.youtube.com/watch?v=gjUzPGKSuDM
- https://www.bhavaniravi.com/about-me/talks
- https://www.bhavaniravi.com/about-me/opensource-contributions