How I monitored Airflow without using its REST API, socket data transmission or DB Querying?

Bowrna


1

Vote

Description:

Overview

In this talk, I will share my experience monitoring Apache Airflow using its Listener Plugin feature. We will discuss the Pluggy framework that powers this feature and how it allows you to plug your custom code over Airflow(or any host system that supports Pluggy) from where you can tap/extend/modify the existing code.

Problem Statement

Airflow is a tool to programmatically orchestrate the data pipelines and create workflows as DAG in the Airflow. Now I want to monitor a DAG and want to know if the DAG has started to run. Airflow has this listener plugin feature that allows watching these events as close to the DB level. For a person used to conventional monitoring mechanisms, the initial thought was is it some kind of callback made on  DB state change. On further digging deeper, this listener plugin part in Airflow is built on top of the Pluggy framework. 

What is Pluggy?

It's a plugin management tool that enables customization through hooking functions. Say you have an original code in your system with the function store_json() which stores the JSON in the filesystem. If you have added pluggy framework and enabled the function hooking over store_json() in host system then you can include a custom code that extends the store_json() function to store the json in the S3 bucket. That way during execution of the function store_json(), the custom code gets executed.

Airflow allows writing hooks on host functions like on_dag_running, on_dagrun_success, on_dagrun_failure so these functions can be hooked and extended according to your needs. In my case, I added a webhook to send the state information about the dag run into my monitoring app. . With this powerful framework, it opens the path to extend/modify the Airflow to support your needs.

Pytest is another library that heavily uses Pluggy framework. I will add examples in my slide from Pytest and Airflow to show how this Pluggy framework is a game-changer.

Prerequisites:

Beginner to Intermediate level experience in Python

Speaker Info:

Bowrna is an open-source enthusiast and contributes to Apache Airflow. With the experience of about 11 years working in the tech industry, She thinks she has learned only a drop in the ocean and wants to explore more. She loves to tinker with backend and distributed systems and has Python and Java experience.

Speaker Links:

https://bowrna.github.io/

Section: Python in Web and Applications
Type: Talk
Target Audience: Intermediate
Last Updated: