Building your own cloud native realtime event data collection platform

Ramjee Ganti (~ramjee)


Description:

Why attend this session? Do you spend too much time :

  • Thinking about which tool to use to collect the event data?
  • How to marshall the data to your environment of choice?
  • Ensuring that the data is processed, validated, enriched and available for further analysis?
  • Adapt to new data sources, evolving data with minimal effort?

Snowplow analytics is popular opensource even data platform. We are huge fans of the same and used it in multiple products of our own. However, Snowplow takes some understanding and effort to setup and maintain. We have built on top of this platform to in Python to make it trivial for developers to drum up their own even data platform. In this workshop I will walk you through a step by step procedure on the different components of an event data collection platform. And how you can drum up your own event data platform for your applications.

Session Outline:

  • What is an event data collection platform?
  • Components of an event data collection platform
  • Measurement protocol and it's importance
  • Self Describing Data
  • Schema Versioning
  • Stream Processing with Apache Beam
  • Setting up your own event pipelines

By the end of the session, we equip you with how to think about data collection and data processing both at scale and when spanning multiple sources. Help you setup your own event data pipeline in any application doing data collection, validation, pre processing, enrichment, storing for further analysis.

Prerequisites:

  • Laptop with Python 3.7
  • Basic experience in working on either ETL or other data processing
  • Clone of the workshop repository
  • Basic understanding of Cloud
  • Google cloud account with either payment enabled or free credits.
  • Knowledge of Python

  • In case this workshop is approved, can get Google Qwiklabs support for all the participants attending for this workshop

Content URLs:

Workshop Repo

Speaker Info:

Ramjee Ganti has lead technology teams at multiple startups covering e-comm/food tech(JustEat), fin tech(BigDecisions), analytics(Datalicious). He was co-organiser of Open Coffee Club Bangalore and is very passionate about challenges at young startups balancing business and technical constraints.

Speaker Links:

Linked In

Twitter

Website

Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Intermediate
Last Updated: