Enhancing Data Pipeline Efficiency with Airlyne: A Declarative Approach to Building Reactive Data Pipelines in Airflow

Amitosh Swain Mahapatra (~agathver)


4

Votes

Description:

Airflow is a cornerstone tool for orchestrating data pipelines, providing robust scheduling capabilities. However, the conventional time-based scheduling paradigm presents challenges when real-time data freshness is critical, leading to inefficient recomputations. In this talk, I introduce "Airlyne," a novel framework designed to enhance the efficiency of data pipelines orchestrated by Airflow.

Traditionally, Data Engineering Teams using Airflow have confined it to time-based schedules, often resulting in costly recomputations to ensure data freshness. Recognizing this limitation, Airlyne introduces a declarative approach that empowers users to define data pipeline workflows in a more granular and adaptive manner.

Leveraging Airflow sensors and datasets, Airlyne allows pipelines to compute segments conditionally based on desired data freshness thresholds.

Through Airlyne, users can author data pipelines with precision, specifying conditions for computation based on the timeliness of data, which minimizes unnecessary recalculations, optimizes resource utilization, and reduces computational overhead.

This presentation will explore Airlyne's architecture, demonstrating its capabilities through use cases and practical examples.

Prerequisites:

This talk is oriented towards data engineers. Since the talk is based on Airflow, I'll explain the terms in the presentation. Prior knowledge of Airflow and big data processing is helpful but not mandatory.

Developers can apply the concepts in the talk to any data pipeline orchestration tool that allows conditional execution or execution of arbitrary Python code.

Speaker Info:

Computer whisperer from Bangalore, India. I've been conjuring code since 2010.

I’ve created and contributed to several open-source projects using Java, Go, NodeJS, and Python.

Currently working as a Platform Engineer at Toplyne, Previously at Gojek. I have worked with the FOSSi Foundation as part of the Google Summer of Code, 2017, and the Fedora Project as part of the Google Summer of Code, 2018. I am an active contributor to the Fedora Project, FOSSi Foundation, Elastic and numerous other open-source projects.

Speaker Links:

Section: Python in Web and Applications
Type: Talk
Target Audience: Advanced
Last Updated: