Who Begat Python? Knowing your Interpreter

Divya Goswami (~divya32)


Description:

Abstract

Python as our favorite language and the reason of this great conference has been winning hearts as one creates and builds applications in all domains. Be it ML or Data Science or Cybersecurity. Although not many will know ( Or actually not WANT to know ) that Python is implemented in C. The original python written by RV Guido, which includes OOP concepts and integrating various libraries and every high level language implementation. All these are written and compiled in C code. The python binary that resides in /usr/bin/ can thus be put into a debugger and run to travel into various regions, including where objects(lists, dicts, tuples) to where import libraries reside (os, sys, etc). The language known to run code ONLY while getting interpreted is actually getting compiled and run on what's called python VM.

My talk will unfurl the key directory in the cpython source tree, reveal __pycache__ mystery, show you what bytecodes are and how they get executed on a stack based virtual machine. Python gives us ample materials to trace function calls and show us how

print("Hello World")

turns into

b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'

which is the bytecode of the above function.

Basic Talk flow:

The talk will walk you through:

  • Cloning and compiling python code from source and run it
  • Exploring Grammar, AST, Objects, Include, Lib directories of Python source tree
  • Conversion of hello.py file to it's .pyc bytecode
  • Various steps traversed during the bytecode tranformation
  • Bytecode interpretation, i.e. running the final compiled version of code
  • Introduction to the Python VM. What are stack based VM
  • How does the bytecode get interpreted (specifically the ceval.c file)

Fun and trivias

Also, I have prepared some queer challenges in between my talks to keep everyone on toes! Do checkout my gist link below for a complete detail of the talk. I have divided it into several sections.

Talk Outline (Breakdown of 30mins):

Who Begat Python

history - 5 mins

Why did I choose this topic?

Intro to python, through the creator's eye - 1 min

Refer to the book - 1 minIntro to Python for the newbies - Whetting your Appetite is the perfect way to get introduced to the powers of Python.

Meme break - 1st trivia question

Python Source Code - 5mins

  • Cloning the repo and run python binary. - 30sec

  • Explain directories - 4 mins

  • Grammar
  • Lib
  • Include
  • Objects
  • Parser

  • Example hello.py file

Conversion using Example - 10mins

  • Read source code (convert hello.py to the interpreter level source code. - 5mins
  • Generate AST and parsing

  • Produce bytecode - 5mins

  • Conclude using instaviz - last minutes

Trivia second question

Run the bytecode - 10mins

  • Visualizing the Python VM - 5 mins

  • Running sample on vm and show stack - 5mins

Solution to first question

Solution for second question as homework

Further reads

Prerequisites:

  • Acquaintance with C code style
  • Know that Python is a programming language.
  • Difference between compiled language and interpreted language

Video URL:

https://youtu.be/uppRBrwjXlQ

Content URLs:

rough talk flow: View gist

Completed Slides: Slides

Speaker Info:

I am an independent security researcher with a DevOps background. I love debugging and disassembling code rather than writing code. I'm an Opensource contributor, previously into OWASP, currently working with Open Mainframe Project, a collaboration between The Linux Foundation and IBM Z. I blog occasionally and keep a keen interest in system architecture. Still an undergrad student and wish to rule over Python (long way to go yet). I have previously given talks on Open Source tools used in data collection using OSINT techniques and Content Security Policy bypass using Polyglot XSS attacks.

Speaker Links:

Find me on

  1. Portfolio
  2. Blogs
  3. Github
  4. Twitter
  5. Linkedin

Section: Core Python
Type: Talks
Target Audience: Beginner
Last Updated: