Automating Data Pipeline using Apache Airflow

Mridu Bhatnagar (~mridubhatnagar)


Today, we are moving towards machine learning. Making predictions, finding out insights based on data. For the same purpose, the initial step is to have efficient processes in place which help us in collecting data from various different data sources. Using traditional ways to collect data is tedious and cumbersome. Manually running scripts to extract, transform and load data is a trade-off with time.

To make the process efficient. The data pipeline can be automated. Scripts to extract data can be auto-scheduled using crontab. However, using crontab has its own drawbacks. One major challenge comes in monitoring. This is where an open-source tool built by Airbnb engineering team - Apache airflow helps. Airflow is a platform to programmatically author, schedule and monitor workflows.


Basic Knowledge of Python

Content URLs:

Outline of the Talk

  1. Background [ Extract, Transform, Load] - 2 mins
  2. Walkthrough the traditional approach of automation using Cron Job - 3 mins
  3. Explain each and every shortcoming of using a cron job[logging, Monitoring] along with use cases where cron job is a better choice for automation - 4 mins
  4. Breakdown the title into distinct words and explain from scratch. Automation + Data + Pipeline + Apache Airflow - 4-5 mins
  5. Introduction to Apache Airflow. Explain Terminologies Workflow, Operators, Acyclic Graph, Directed Acyclic Graphs - 10 mins
  6. Screenshots along with an explanation of UI interface and shortcomings with Apache Airflow - 5 mins
  7. Airflow Architecture

Speaker Info:

Mridu Bhatnagar is a software development engineer at Goibibo, organizes DjangoGirls Indore, Pyladies Delhi. Tech stack she is currently working on is Python and Django. When not coding she loves to experience outdoors, volunteer as a speaker to share her learnings and learn from other enthusiasts.

Github Link:

Twitter Link:

Speaker Links:

Past Experience [December 2018 - Present]

PyData Delhi meetup

a. Introduction to APIs - Talk Video -

b. Virtual Environment in Python -

LinuxChix India

a. Tech Journey so far - Linux User Group Delhi [ILUGD] a. Playing around with APIs - b.

Pyladies Delhi a. Virtual Environment in Python - a. Python for All - Video -

LetsPy Delhi a. Small Video - b.

DjangoGirls, Bangalore a. Coach[February, 2019] -

DjangoGirls, Pune a. Coach[22-06-2019] -

Women who Go, Delhi + Pyladies Delhi + LinuxChix India combined meetup a. Understanding HTTP from ground up -

Drupal Camp 2019, Delhi a. Automating data pipelines using Apache AIrflow


  1. Pybites Blog -
  2. Medium personal blog Twitter Data Retrieval - Word Notifier -

Id: 1274
Section: Developer tools and automation
Type: Talks
Target Audience: Intermediate
Last Updated: