Boosting Data Processing: Performance Tune Pandas

Asha Holla (~asha2)


1

Vote

Description:

This talk will be a in-depth exploration of techniques to enhance the performance of Pandas, the powerful data analysis library in Python. It will cover strategies, tips, and best practices for optimizing data processing workflows, leading to faster and more efficient analysis.

Recognizing the challenges posed by big data, we're well aware that Pandas can struggle with large datasets. Given that optimization is integral to tech, this talk delves into effective strategies for accelerating Pandas operations be it simple transformations on data or data export/imports to databases. I will also cover alternate supporting libraries to use and simple modifications to existing code to speed up execution.

We will have a live demo with code snippets demonstrating the usage and performance comparison as opposed to traditional methods which are used widespread.

This session aims to address 4 key points:

  1. Why pandas is slow when it comes to handling big data?
  2. Slight code modifications to existing pandas code syntax
  3. Using different libraries to speed up execution - like SQLAlchemy, NVIDIA’s RAPIDS cuDF library among others
  4. Performance comparison between proposed and existing methods

Drawing from personal experience, I'll share tried-and-tested methods to optimize Python scripts using Pandas and reduce pipeline execution time, ultimately enhancing resource efficiency.

Prerequisites:

A prerequisite for attendees to this talk is to have a basic understanding of Python programming and familiarity with the Pandas library. This ensures that attendees have the necessary foundational knowledge to grasp the concepts and techniques discussed during the talk.

Related work experience would help, as this will provide context for understanding the performance optimization strategies presented in the talk.

Video URL:

https://drive.google.com/file/d/19ib96O6it_oivTWBPOuisGjZnQ19LYsH/view?usp=sharing

Content URLs:

Link 1

Link 2

Link 3

Link 4

Speaker Info:

Asha is a seasoned data professional with over two years of experience in Big data and AI. Currently, She's proud to be a part of Bloom Value, a US based AI startup focused on delivering innovative solutions to the US healthcare market. She am a Microsoft certified data professional and also holds a bachelors degree in computer science and engineering from Visvesvaraya Technological University.

Asha's journey into this field began as she found the title of data scientist intriguing, but it was working with data that truly captured her imagination. She sees data not only as a science but also as a creative outlet with endless possibilities waiting to be explored.

Within Bloom Value, she often wears multiple hats, including that of Chief Vibes Officer. She takes pride in fostering a positive and inclusive workplace environment, heading initiatives and events that bring joy and unity to our team.

Beyond her professional role, Asha is deeply passionate about advocating for women in tech. Actively participating in community events and initiatives, she find immense satisfaction in giving back to the community. She also enjoy sharing my knowledge and experiences through tech blogs and posts, sharing my experiences, learnings on the day to day, mixed in with bit of humor. Her goal is to bridge the gap in spaces where information is sparse, based on my firsthand experiences in the field.

For Asha, being able to make a difference, both within her company and in the broader tech community, is incredibly fulfilling.

Speaker Links:

Links to my first tech talk at hackerspace club of PES college, Bengaluru

  1. Link 1- Linkedin Post
  2. Link 2 - Instagram Post
  3. Link 3 - Instagram Reel
  4. Link 4 - PPT Slides of talk P.S do not open in google slides as it causes the content to skew. Download or open in PowerPoint if possible

I post daily content on linkedin : Link

Link to github repo : Link

Published Research paper on deep learning project : Link

email : asha.arvholla@gmail.com

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Intermediate
Last Updated: