Objective
- Understand the theoretical fundamentals behind language models.
- Learn how to build a simple content-generator from scratch.
- Understand the common pitfalls affecting language models.
- Learn how to improve our simple content-generator.
- Learn how to modify our content-generator to detect the source of plagiarized content.
- Learn how to make life easy by using NLTK.
- Learn how to use random forests to improve on traditional n-gram approaches.
- Discover the various problems for which language model-based approaches has been used as solutions.
Description
This talk is an introduction to language models, where we will converse primarily in math and Python. We will cover the theory behind the approach, a number of demonstrations and how-it-works examples, practical issues when applying language models in the real world and how we solve them in code. We'll also cover using NLTK to simplify common language model tasks, and if time permits, explore random forests and how they're used with language models. We'll end with a brief survey of language models in the wild.
Speaker bio
Emaad Manzoor hacks on distributed computing, language processing and computer vision, hoping to engineer the Oracle someday. He currently builds prescient systems for the trend detection platform at Yahoo!.