Testing Data Pipelines

Amitosh Swain Mahapatra (~agathver)


30

Votes

Description:

Hello 👋!

In this talk, I'll review a great way to test airflow data pipelines using Python and Pytest. This approach's primary goal is to ensure data flows smoothly through the pipeline by quickly identifying and fixing any problems.

When it comes to testing pipelines, the process is similar to testing software applications. It includes running unit tests for each pipeline component, integration tests for the entire pipeline, and end-to-end tests to ensure accurate data output. However, unique methods, such as data snapshot testing and online and offline data quality checks, are also involved.

Prerequisites:

Have an understanding of SQL and Airflow

Speaker Info:

Computer whisperer from Bangalore, India. Been conjuring code since 2010.

I’ve created, as well as contributed to a number of open-source projects using Java, Go, NodeJS, & Python.

Currently working as a Platform Engineer at Toplyne, Previously at Gojek. I have worked with the FOSSi Foundation as part of the Google Summer of Code, 2017, and the Fedora Project as part of the Google Summer of Code, 2018 and I am an active contributor to Fedora Project, FOSSi foundation, Elastic and numerous other open-source projects.

Speaker Links:

Section: Data Science, AI & ML
Type: Talks
Target Audience: Intermediate
Last Updated: