Building Great Python Data Science Libraries

hmansell


Description:

Python has become the de-facto standard in a variety of Data Science use cases, such as Machine Learning and quantitative finance. It has succeeded because of great libraries that address important use cases for users. But how do you build a great library?

This talk is based on Howard's experiences leading the PyTorch project and the Research platform at AQR, along with prior projects in the Haskell and .Net world. He will give a brief introduction to PyTorch, comparing it to numpy and Pandas, and talk about why we believe it was successful. He will cover lessons learned and best practices for building great data science libraries, using examples from previous projects. Finally, he will talk about how the Python ecosystem and Core Python could improve to make it an even better platform for Data Science.

Outline

  • Brief intro to PyTorch, Pandas and numpy, and what they have in common.
  • Finding a niche
  • Leverage the ecosystem
  • Build a toolkit rather than a framework
  • Design APIs for users, not implementers
  • Design for transparency and flexibility, then optimize performance
  • Have a simple execution model
  • Build iteratively
  • Engage with users from day 0
  • Know your performance bottlenecks
  • Pros and cons of Python for Data Science, and what we can do about them

Prerequisites:

Basic knowledge of Python and numerical libraries like Pandas or PyTorch.

Speaker Info:

Howard Mansell currently works at AQR Capital Management, a leading Systematic Asset Management firm, where he leads the Research Engineering department. AQR were the original authors of Pandas, and its investment platform is heavily based on Python and Pandas. They have recently opened an Engineering office in Bangalore.

Prior to AQR, Howard was Head of Engineering for Facebook AI Research, where his team built and open sourced the PyTorch Deep Learning framework. Further in the past, Howard worked in several finance companies, where his teams built development environments for researchers to rapidly iterate on pricing and valuation models for financial products.

Howard loves building great tools for developers, especially those who don't think they are developers because they are working in non-traditional programming environments like Quantitative Research and AI. He has dabbled extensively with functional programming languages such as Haskell and F# in the past, so also loves static typing and reproducibility.

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Beginner
Last Updated: