Managing custom, reproducible Python virtual environments for PySpark and Jupyter Notebooks @ Uber

Sayan Pal (~sayan0)




Python environment management is easy! Well okay, not so much. Now, think of maintaining multiple versions of Python environments in a single container and providing easy access for the users. This is getting tricky!

The Uber Data Platform team builds and maintains a product for exploratory analysis using Python and essentially, providing compute environments with Jupyter and a plethora of built-in tooling to access Uber's Data Lake. This platform is used heavily by the Applied Scientists, Data Scientists, Analysts, and Operations folks daily. To provide an easy and hassle-free experience to the users, different flavours of Python snapshots are preloaded in the Jupyter container. The users are also provided a mechanism to create their bespoke snapshot with a certain set of Python packages which can be replicated in the Jupyter containers that they spawn. In this talk, I will speak about the evolution of the snapshotting mechanism and the overall state of package management within the product.

Specifically, I will be covering:

  • What is a Python environment and how do we maintain multiple such environments in a machine?
  • How to create replicable Python environments - requirements.txt vs conda pack
  • What was the state of the Python environment snapshotting @ Uber a year back?
  • How does the Python module search path work and how do we leverage the same to create hierarchical Python environments?
  • The benefits, the challenges, and the learnings of maintaining hierarchical Python environments using .pth files - Reference


  • Basic understanding of Python
  • Python packages and how they are installed

Video URL:

Speaker Info:

Sayan is a Software Engineer at Uber who has built backend systems for nearly a decade. He started his career at Infosys and got the opportunity to build innovative products at PhonePe, Yelp, and others. Even though he likes building end-user solutions, he is enjoying the platform side of things lately. He has a deep interest in capital markets and tech. You can find him travelling, reading, and playing online chess when not working.

Speaker Links:

LinkedIn -

Section: Python in Platform Engineering and Developer Operations
Type: Talk
Target Audience: Intermediate
Last Updated: