Testing Data Pipelines
Amitosh Swain Mahapatra (~agathver) |
Description:
Hello 👋!
In this talk, I'll review a great way to test airflow data pipelines using Python and Pytest. This approach's primary goal is to ensure data flows smoothly through the pipeline by quickly identifying and fixing any problems.
When it comes to testing pipelines, the process is similar to testing software applications. It includes running unit tests for each pipeline component, integration tests for the entire pipeline, and end-to-end tests to ensure accurate data output. However, unique methods, such as data snapshot testing and online and offline data quality checks, are also involved.
Prerequisites:
Have an understanding of SQL and Airflow
Speaker Info:
Computer whisperer from Bangalore, India. Been conjuring code since 2010.
I’ve created, as well as contributed to a number of open-source projects using Java, Go, NodeJS, & Python.
Currently working as a Platform Engineer at Toplyne, Previously at Gojek. I have worked with the FOSSi Foundation as part of the Google Summer of Code, 2017, and the Fedora Project as part of the Google Summer of Code, 2018 and I am an active contributor to Fedora Project, FOSSi foundation, Elastic and numerous other open-source projects.