Writing a Regex Engine in Python

arjoonn sharma (~theSage21)


10

Votes

Description:

We write a super simple Regex Engine in python, illustrating along the way how and why regular expressions work.
We also mention why the default re is so nice in some ways.

Topics covered:

  • Finite State Automatons (DFA, NFA)
  • Internally what is happening when you write regex patterns
  • Algorithms needed to write a regex engine
  • How to extend your engine in the future

Did I mention this is in pure Python? Yep, this is pure Python.

Motivation:

  • What would you get out of this?
    • Learn how to compose regular expressions.
    • Get a look behind the scenes of almost any regular expression engine.
    • Understand which expressions would end up being expensive ones.
  • This is not a programming tutorial.
  • To keep things simple we only allow |, *, concatenation operators since others can be made up using them
  • We only demonstrate the simplest of regular expressions and show how other complex ones may be made.
  • This is not a talk setting up the case for ditching re; instead offering explanations as to how re and other regex engines basically take a string (expression) and turn that into a program which either accepts or rejects candidate strings.

Bonuses:

  • A mini regex tutorial if time allows.

Prerequisites:

  • How to write a function in Python
  • Python data types

Content URLs:

Code for the engine is here:
https://gist.github.com/theSage21/c2da376a26c0000774a257e32a51ce34

Speaker Info:

I'm a Masters student at IIITM-K currently working on my Masters' Thesis. I love building things with Python, blogging, writing articles for OSFY and so on. I've been swimming in Python for about 5 years now, and absolutely love it.

Speaker Links:

  • https://github.com/theSage21
  • http://arjoonn.blogspot.in/
  • https://twitter.com/arjoonn1

Section: Others
Type: Talks
Target Audience: Intermediate
Last Updated: