Navigating the Python Ecosystem for Data Science
Ananth Krishnamoorthy (~ananth) |
In their day-to-day jobs, data science teams and data scientists face challenges in many overlapping yet distinct areas such as data extraction, processing & storage, reporting, scientific computing, machine learning , ml workflow management, etc. The Python ecosystem for data science has a number of tools and libraries for various aspects of data science.
The idea of this talk is to understand what the Python data science ecosystem offers, and how these different tools work (and don’t work) together with each other. It is intended as a landscape survey of the python data science ecosystem, along with a mention of some common gaps that practitioners may notice as they put together a toolkit for themselves.
Outline for the talk:
- Challanges faced by data scientists / data science teams
- Review of key python tools - Jupyter, Pandas, Scikit-Learn, Keras / TensorFlow, Matplotlib, Bokeh, Blaze, Odo, Dask, pySpark
- Solving small data, medium data, big data problems in data science
- What works well
- Gaps: A practitioner perspective
- Familiarity with Python
- Some hands on experience in building machine learning models in Python
- I have been working in applications of analytical techniques based on mathematical optimization, machine learning, discrete event simulation, and time series analysis, to real world business problems for the last 17 years
- I am the co-founder of rorodata, a startup that is building a cloud based data science platform, and the head of Hypercube Analytics, an analytics consulting company.
- I have a Ph.D. in Industrial Engineering and Management from Oklahoma State University