Writing a Regex Engine in Python

arjoonn sharma (~theSage21)


10

Votes

Description:

We write a super simple Regex Engine in python, illustrating along the way how and why regular expressions work.
We also mention why the default re is so nice in some ways.

Topics covered:

  • Finite State Automatons (DFA, NFA)
  • Internally what is happening when you write regex patterns
  • Algorithms needed to write a regex engine
  • How to extend your engine in the future

Did I mention this is in pure Python? Yep, this is pure Python.

Motivation:

  • What would you get out of this?
    • Learn how to compose regular expressions.
    • Get a look behind the scenes of almost any regular expression engine.
    • Understand which expressions would end up being expensive ones.
  • This is not a programming tutorial.
  • To keep things simple we only allow |, *, concatenation operators since others can be made up using them
  • We only demonstrate the simplest of regular expressions and show how other complex ones may be made.
  • This is not a talk setting up the case for ditching re; instead offering explanations as to how re and other regex engines basically take a string (expression) and turn that into a program which either accepts or rejects candidate strings.

Bonuses:

  • A mini regex tutorial if time allows.

Prerequisites:

  • How to write a function in Python
  • Python data types

Content URLs:

Code for the engine is here:
https://gist.github.com/theSage21/c2da376a26c0000774a257e32a51ce34

Speaker Info:

I'm a Masters student at IIITM-K currently working on my Masters' Thesis. I love building things with Python, blogging, writing articles for OSFY and so on. I've been swimming in Python for about 5 years now, and absolutely love it.

Speaker Links:

  • https://github.com/theSage21
  • http://arjoonn.blogspot.in/
  • https://twitter.com/arjoonn1

Section: Others
Type: Talks
Target Audience: Intermediate
Last Updated:

The content may become difficult, but should be an useful talk for many.

Kushal Das (~kushal)

I have made an effort to keep the talk as informative as possible without introducing complexity into it. But you do have a point. It might end up being a heavy one.

arjoonn sharma (~theSage21)

Why is this classified under Python 3k? There is nothing here that talks about 2 to 3 migration, or writing compatible code. May be "Others" will be appropriate.

Vijay Kumar (~bravegnu)

I had classified this as 3k since I've only tested the code on 3k. Now that you mention it it does fit the others category too. I'll ask an admin and change if allowed.

arjoonn sharma (~theSage21)

Login to add a new comment.