Building a data pipeline for processing and storing Twitter data on AWS with Python
Twitter data is a valuable source for academic researchers who wish to study the public conversation. Twitter's filtered stream API allows developers to get a sample of the real-time Tweets as they happen. However, in order to effectively process, store and study the Twitter data, you should have an efficient data pipeline. In this workshop, we will build a data pipeline on Amazon Web Services (AWS) using Python to:
- Stream data from the Twitter Filtered Stream Endpoint
- Process this data with Amazon Simple Queueing Service (SQS)
- Store this Tweet data on Amazon S3
Ideally, participants should apply for a Twitter Developer Account and have an AWS account prior to attending this session.
Suhem Parack is a Sr. Developer Advocate at Twitter. He focuses on helping the academic research community succeed on Twitter's Developer Platform. Prior to joining Twitter, he worked as a Solutions Architect at Amazon in the Alexa org. Outside of writing code and helping developers, he enjoys reading and running.
https://developer.amazon.com/blogs/home/author/Suhem+Parack, https://blog.twitter.com/developer/en_us/authors.suhemparack.html, https://dev.to/suhemparack