Making Machine Learning Fruitful and Fun using Orange
Ankit Mahato (~ankit60) |
In this workshop we will visually uncover the various aspects of an Analytics Pipeline using Orange 3, a Python based open source interactive data analysis, machine learning and data visualization workbench. Its simple "drag-and-drop" based workflow design interface makes it ideal for novices, and its modular design, extensibility and python integration makes it powerful for advanced data science.
The workshop will begin with the building of basic analytics pipeline using built-in Orange widgets, which will further evolve into complex analytics pipeline covering advanced topics like - In-database analytics, Using external ML toolkit, Integration with R, Exporting developed models etc. For these advanced topics the audience will be made familiar with the GUI and computational concepts involved in the development of add-on (custom-built) widgets for Orange.
Hands-on experience of the various aspects of Data Analytics Pipeline will be provided in this workshop:
- Data Access (files & external data sources)
- Data Exploration
- Data Transformation/Filtering
- Model development using supervised/unsupervised machine learning algorithms (in-built, scikit-learn, in-database, nltk, R-integration)
- Basic and advanced Visualization (in-built, matplotlib)
- Exporting developed model (PMML, PFA)
- Champion/Challenger model experiments
Real life analytic use cases (Sentiment Analysis, IoT, Finance) will be selected for the workshop.
None for building basic analytics pipeline.
Basic Python Programming (development of simple functions and classes) for widget development section.
Download Orange 3.
Workshop slides are as follows - Link
Please note that the slides are only for the theory part. The workshop will be interactive and will contain exercises for learning.
My Youtube video shows the execution of an advanced in-database Decision Tree Model using Orange.
The developed orange3 add-on repository used to provide widgets for the analytics pipeline mentioned in the above video.
Please note that the above video and repository demonstrate an advanced Orange 3 capability which will be covered in the workshop.
Ankit is a Product Manager with 3+ years of industrial experience in machine learning, quantitative modelling, data analytics and visualization. Over the years, he has developed an expertise in handling the entire data analytics pipeline comprising – ingestion, exploration, transformation, modeling and deployment. He is a polyglot programmer with an extensive knowledge of algorithms, statistics and parallel programming. He has shipped multiple releases of DB LytixTM, a comprehensive library of over 800 mathematical and statistical functions used widely in data mining, machine learning and analytics applications, including “big data analytics”.
A die hard Pythonista, Ankit is an open source contributor and a former Google Summer of Code 2013 scholar (under Python Software Foundation). Currently, he is contributing to the following open source projects:
- opendatagroup/hadrian - Implementations of the Portable Format for Analytics (PFA)
- Fuzzy-Logix/AdapteR - Advanced analytics package that enables R users to perform in-database analytics
An IIT Kanpur alumnus, Ankit is also an active researcher with publications in international journal and conferences. He is actively working in the domain of IoT Analytics and recently presented his work - “In-database Analytics in the Age of Smart Meters” in the 5th IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence, 2017.