Automating Data Pipeline using Apache Airflow

Mridu Bhatnagar (~mridubhatnagar)


Today, we are moving towards machine learning. Making predictions, finding out insights based on data. For the same purpose, the initial step is to have efficient processes in place which help us in collecting data from various different data sources. Using traditional ways to collect data is tedious and cumbersome. Manually running scripts to extract, transform and load data is a trade-off with time.

To make the process efficient. The data pipeline can be automated. Scripts to extract data can be auto-scheduled using crontab. However, using crontab has its own drawbacks. One major challenge comes in monitoring. This is where an open-source tool built by Airbnb engineering team - Apache airflow helps. Airflow is a platform to programmatically author, schedule and monitor workflows.


Basic Knowledge of Python

Content URLs:

Outline of the Talk

  1. Background [ Extract, Transform, Load] - 2 mins
  2. Walkthrough the traditional approach of automation using Cron Job - 3 mins
  3. Explain each and every shortcoming of using a cron job[logging, Monitoring] along with use cases where cron job is a better choice for automation - 4 mins
  4. Breakdown the title into distinct words and explain from scratch. Automation + Data + Pipeline + Apache Airflow - 4-5 mins
  5. Introduction to Apache Airflow. Explain Terminologies Workflow, Operators, Acyclic Graph, Directed Acyclic Graphs - 10 mins
  6. Screenshots along with an explanation of UI interface and shortcomings with Apache Airflow - 5 mins

Speaker Info:

I am Mridu Bhatnagar. A computer science and engineering graduate from NIIT University, batch of 2013-2017. I am working as a software engineer with Goibibo as a part of the Marketing Technology team. On weekends I love to volunteer, attend meetups and share the learnings. Tech Stack I primarily work on is Python and its related web frameworks.

Github Link:

Twitter Link:

Speaker Links:

Past Experience [December 2018 - Present]

PyData Delhi meetup

a. Introduction to APIs - Talk Video -

b. Virtual Environment in Python -

LinuxChix India

a. Tech Journey so far - Linux User Group Delhi [ILUGD] a. Playing around with APIs - b.

Pyladies Delhi a. Virtual Environment in Python - a. Python for All - Video -

LetsPy Delhi a. Small Video - b.

DjangoGirls, Bangalore a. Coach[February, 2019] -

DjangoGirls, Pune a. Coach[22-06-2019] -

Women who Go, Delhi + Pyladies Delhi + LinuxChix India combined meetup a. Understanding HTTP from ground up -

Drupal Camp 2019, Delhi a. Automating data pipelines using Apache AIrflow


  1. Pybites Blog -
  2. Medium personal blog Twitter Data Retrieval - Word Notifier -

Id: 1274
Section: Developer tools and automation
Type: Talks
Target Audience: Intermediate
Last Updated: