Something for Nothing: Boostrapping Text Classification

Alizishaan Khatri (~alizishaan)




The hardest part of building a text classifier is finding labelled data to train the model. The next hardest part is making sure that data is fair and representative. In this talk we will discuss some approaches to rapidly generating corpora suitable for supervised training from public data and with open-source tools.

This talk will include some practical tips as well as some less-obvious pitfalls, and is suitable for both novices and more experienced Natural Language Processing Practitioners.

At the end of the talk you will be able to give a convincing answer to the eternal question: How do I build a text classifier for a product that doesn't exist yet?

Co-presented with Alex O'Connor

Speaker Info:

Alizishaan's professional passions revolve around two things : using technology to solve real-world problems and sharing solutions with the community. He is currently employed as a Machine Learning Engineer with Pivotus where he works on problems in the Natural Language Processing space. Over the summer of 2017, he designed and built an offensive content detection system for a Silicon Valley company. Past industry projects include a price-prediction system for cars and a status communication system that minimized false alerts.

Outside of work, Alizishaan's passions include mountaineering, skiing, travelling and photography.

Id: 936
Section: Data science
Type: Talks
Target Audience: Beginner
Last Updated: