Getting Kafka-esque with Python

Dipen Chawla (~dipen)


Description:

The increasingly abundant real-time data produced by modern ML and software applications necessitates a distributed architecture which is resilient when pushed to its limits. Apache Kafka is a stream processing framework built for dealing with this problem, and is a primary driver of data pipelines at large scale organizations like Netflix, etc.

With this workshop, we plan to provide the building blocks for the PyCon India community to be able to build resilient stream processing applications on their own data using Apache Kafka. The participant can expect to walk away from the workshop with fair knowledge about :

  • Deploying Python applications to Kafka
  • Integrating Python apps to data sources and sinks
  • Understanding data paradigms in modern streaming architectures.

Agenda

  1. A quick primer on streaming applications (~10 mins)
  2. Writing publishers & subscribers in Python (~20 mins)
  3. Introduction to the Kafka Architecture (~15 mins)
  4. Introduction to few key peripherals to Kafka (~15mins)
  5. Setting up local Kafka brokers & peripherals (~45 mins)
  6. Deploying the Python end-clients to the cluster (~15mins)
  7. Integrations with databases with Connect & KSQL (~30 mins)

Who is this talk for?

  • Developers or Cloud Architects who are looking to learn to migrate their Python applications to event driven style architectures.
  • People who are generally curious to learn about Kafka

Prerequisites:

The workshop will be conducted with Python3. Other topics will be discussed from scratch, so prerequisite knowledge is not required. We will be providing the sample data as well as the setup instructions for the workshop.

Please ensure that your system has 4-8G memory, ample disk space and docker configured

Video URL:

https://youtu.be/GZODerVEgko

Speaker Info:

Dipen Chawla, Data Engineer, Episource

Dipen is a member of the MLOps and Engineering team at Episource, where he works on the deployment of scalable and secure data architectures to the cloud. His primary areas of interest include container tech & ML in production. Outside of work, he can be found working on his raspberry PI k8s cluster running in the corner of his balcony (don't ask him about it) and reading historical fiction.

Reach out on Twitter.

Devesh Bajaj, Data Engineer, Episource

Devesh is a Data Engineer with Episource with 2 years of experience designing microservices for Machine Learning Inference along with development and deployment of end to end full stack applications. He has a keen passion of development of IoT based application and won 2nd​ ​prize​ ​in​​ CanSat​ Satellite​ Competition, 2017​ ​organised​ by​ ​American Astronautical​​ Society​ ​and​ ​recognised​ ​by​ NASA.

You can find Devesh at

  • LinkedIn - https://www.linkedin.com/in/devesh-bajaj/
  • Twitter - https://twitter.com/deveshbajaj59
  • Medium - https://medium.com/@deveshbajaj59

Section: Decentralised and Distributed Technology
Type: Workshop
Target Audience: Beginner
Last Updated: