1 Billion rows vs Python: Navigating the 1BRC in pure python

Kartheek S (~kartheek)


11

Votes

Description:

In February, Gunnar Morling launched the "1 Billion row challenge" dubbed 1BRC primarily for Java. But the internet took it as a challenge and started making implementations in various languages. Me being a python fanatic wanted to take a jab at it in Python 🐍.

This talk delves into Python's capabilities compared to other programming languages at implementing a solution for the 1BRC. We will explore the inherent performance bottlenecks that often hinder Python from achieving top-tier efficiency from a pure Python perspective, without using libraries like numpy or projects like DuckDB . Additionally, you'll discover a treasure trove of hacks and tricks to supercharge your Python code, pushing its performance.

Filled with fascinating Python details, this session is designed for enthusiasts at all stages of their Python journey. Whether you're a seasoned developer or just starting out, you'll gain valuable insights and techniques to enhance your data processing skills. Join me to explore the intricacies of our beloved programming language through the lens of large-scale data challenges 🚀

Prerequisites:

https://github.com/gunnarmorling/1brc

Content URLs:

Draft slides: https://docs.google.com/presentation/d/1waMTB1rGbYdj0Lc8b0TCYUfzqWgzo5pC1Yafr55tAgI/edit?usp=sharing

Speaker Info:

Meet Kartheek, Head of Engineering at ScaleGenAI, with extensive experience in Python, Deep Learning, and Cloud/System Architecture. Kartheek has led various projects, including MLOps and highly available systems. His expertise, in Python, lies in concurrency, and he is currently focused on helping companies build their own AI infrastructure at ScaleGenAI.

Speaker Links:

LinkedIn GitHub Twitter

Section: Core Python
Type: Talk
Target Audience: Intermediate
Last Updated: