Machine learning techniques for building a large scale production ready prediction system using Python

Arthi Venkataraman (~arthi)




This presentation is about how to build a scalable prediction system for near real time use using Python. A user will enter his / her requirements using Natural language. The system will do a natural language processing of the input text. Machine learning models will be used to predict the needed fields. This solution can be used for different use cases including agent allocations for Incident work bench , Recommendation services , etc.

This presentation is based on our experience in creating and deploying a large scale prediction system which is being used live across all employees of our company. Presentation will bring out 1. Challenges in text classification 2. Steps needed in text classification 3. Why Python is suitable for this task and How we can solve some of the challenges using Python 4. How to build a working text classifier in Python

System is built using combination of Natural Language Processing techniques and machine learning techniques with Python as the predominant language. Takeaways from the talk will include what to look for in text classification and how to build an actual text classifier in Python.

About Prediction Systems Across different systems there is a need for users to select different options. Users need to select one from many possibilities. For example if you have a complaint you would need to select under which category this complaint has to be raised. ( Department , sub group, specific class, etc ). Larger the number of possibilities more effort is needed from the end user to be able to narrow down on the correct selection. In many cases there is a hierarchical tree which has to be navigated for the user to be able to reach the correct selection. If a top level selection is wrong user will never be shown the correct option which he / she can choose. Classification systems attempt to correctly assign to correct option for a given request. If it is hierarchical selection the system should be able to predict across all levels.

Machine Learning in Python. The heart of the prediction systems are the machine learning algorithms. Python has strong support for machine learning algorithms. The key package we will use is the scikit-learn package. This package is powerful and enables developers with minimal machine learning experience to develop powerful classification systems.

Takeaways from the talk will include what to look for in text classification and how to build a text classifier in Python.


While attempt would be made to make the presentation understandable to all, a Knowledge of Python and Understanding of Machine Learning concepts and stages in machine learning would help the audience better appreciate the presentation.

Content URLs:

The below slide share link details the topic to be covered. It brings out the main sections and what will be covered under each of the sections.

Detailed slides of the contents uploaded in Slide Share.

Speaker Info:

Arthi Venkataraman has > 18.5 years of experience in the design, development and testing of projects in different domains • She is currently a Senior Architect in the Chief Technology Office of Wipro Technologies • She is a Senior Member of the DMTF technical cadre in Wipro. Her current role involves solution development for different business problems spanning the area of Big Data, Machine Learning and Semantics Technologies • She has a B.E Degree in Computer Science from University Visvesvariah College of Engineering, Bangalore and an MBA (PGDSM) from IIM, Bangalore. She is also a PMP. • She has previously presented papers and spoken at other international conferences This presentation is based on Arthi's experience in area of building a large scale production grade classifier using Python.

Speaker Links:

Links to some other presentations made.

  1. Similar Entity Detection in large Data at Fifth Elephant 2013 -



Section: Data Visualization and Analytics
Type: Talks
Target Audience: Intermediate
Last Updated: