Building a data pipeline for processing and storing Twitter data on AWS with Python





Twitter data is a valuable source for academic researchers who wish to study the public conversation. Twitter's filtered stream API allows developers to get a sample of the real-time Tweets as they happen. However, in order to effectively process, store and study the Twitter data, you should have an efficient data pipeline. In this workshop, we will build a data pipeline on Amazon Web Services (AWS) using Python to:

  • Stream data from the Twitter Filtered Stream Endpoint
  • Process this data with Amazon Simple Queueing Service (SQS)
  • Store this Tweet data on Amazon S3


Ideally, participants should apply for a Twitter Developer Account and have an AWS account prior to attending this session.

Speaker Info:

Suhem Parack is a Sr. Developer Advocate at Twitter. He focuses on helping the academic research community succeed on Twitter's Developer Platform. Prior to joining Twitter, he worked as a Solutions Architect at Amazon in the Alexa org. Outside of writing code and helping developers, he enjoys reading and running.

Speaker Links:,,

Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Intermediate
Last Updated: