Managing custom, reproducible Python virtual environments for PySpark and Jupyter Notebooks @ Uber
Sayan Pal (~sayan0) |
12
Description:
Python environment management is easy! Well okay, not so much. Now, think of maintaining multiple versions of Python environments in a single container and providing easy access for the users. This is getting tricky!
The Uber Data Platform team builds and maintains a product for exploratory analysis using Python and essentially, providing compute environments with Jupyter and a plethora of built-in tooling to access Uber's Data Lake. This platform is used heavily by the Applied Scientists, Data Scientists, Analysts, and Operations folks daily. To provide an easy and hassle-free experience to the users, different flavours of Python snapshots are preloaded in the Jupyter container. The users are also provided a mechanism to create their bespoke snapshot with a certain set of Python packages which can be replicated in the Jupyter containers that they spawn. In this talk, I will speak about the evolution of the snapshotting mechanism and the overall state of package management within the product.
Specifically, I will be covering:
- What is a Python environment and how do we maintain multiple such environments in a machine?
- How to create replicable Python environments - requirements.txt vs conda pack
- What was the state of the Python environment snapshotting @ Uber a year back?
- How does the Python module search path work and how do we leverage the same to create hierarchical Python environments?
- The benefits, the challenges, and the learnings of maintaining hierarchical Python environments using
.pth
files - Reference
Prerequisites:
- Basic understanding of Python
- Python packages and how they are installed
Video URL:
https://youtu.be/m_98l6HFraU
Speaker Info:
Sayan is a Software Engineer at Uber who has built backend systems for nearly a decade. He started his career at Infosys and got the opportunity to build innovative products at PhonePe, Yelp, and others. Even though he likes building end-user solutions, he is enjoying the platform side of things lately. He has a deep interest in capital markets and tech. You can find him travelling, reading, and playing online chess when not working