+1 -1 +21
Vote on this proposal

Language Modeling With Python

by Emaad Ahmed Manzoor (speaking)

Scientific Computing
Session type
Technical level


  • Understand the theoretical fundamentals behind language models.
  • Learn how to build a simple content-generator from scratch.
  • Understand the common pitfalls affecting language models.
  • Learn how to improve our simple content-generator.
  • Learn how to modify our content-generator to detect the source of plagiarized content.
  • Learn how to make life easy by using NLTK.
  • Learn how to use random forests to improve on traditional n-gram approaches.
  • Discover the various problems for which language model-based approaches has been used as solutions.


This talk is an introduction to language models, where we will converse primarily in math and Python. We will cover the theory behind the approach, a number of demonstrations and how-it-works examples, practical issues when applying language models in the real world and how we solve them in code. We'll also cover using NLTK to simplify common language model tasks, and if time permits, explore random forests and how they're used with language models. We'll end with a brief survey of language models in the wild.

Speaker bio

Emaad Manzoor hacks on distributed computing, language processing and computer vision, hoping to engineer the Oracle someday. He currently builds prescient systems for the trend detection platform at Yahoo!.