A Crash Course in CPython Internals

Abhinav Upadhyay (~abhinav-upadhyay)


1

Vote

Description:

Objective: The goal of this workshop is to guide attendees from cloning the CPython repository from GitHub to understanding the key components of its architecture. By introducing a new language feature and modifying the CPython codebase to support it, attendees will gain an intuitive grasp of the key components of the CPython internals

Description: The CPython codebase is extensive and complex, and while it's impossible to cover all of it in just 3 hours, this workshop aims to provide a foundational understanding of its architecture. Participants will leave with the skills to navigate the codebase, make modifications, and maybe even start contributing to Python.

Proposed Outline:

  1. Setup (_30 minutes_)

    • Cloning the CPython Repo and Setting Up VS Code
      • Guide participants through cloning the official CPython repository and configuring VS Code for code exploration.
    • Understanding the Code Organization
    • Building CPython
      • Overview of relevant build flags for CPython development, followed by a build process.
    • Running and Debugging CPython
      • Demonstrate attaching gdb to a running Python process and basic debugging techniques.
  2. Background on Bytecode Compiled Languages (_30 minutes_)

    • Overview of Bytecode Compiled Languages
      • Present a high-level view of how bytecode compiled languages work.
    • Stages of a Simple Program: Tokenizing, Parsing, Compiling, Bytecode Execution
      • Walkthrough of each stage using a simple example.
      • Provide context and refresh knowledge of concepts like stack frames and bytecode representation.
  3. Learning CPython Internals by Adding Our Own Language Feature (_2 hours_)

    • Introducing a New Language Feature
      • Guide participants through the steps required to introduce a new syntactic feature in Python.
        • The Lexer
        • Explanation of lexer functionality and its implementation.
        • Adding a token for the new feature.
        • The Parser
        • Discussion on parser grammar.
        • Teach the parser to understand and parse the new feature.
        • The Compiler
        • Overview of compiler implementation and how it processes common Abstract Syntax Tree (AST) nodes.
        • Introduce modifications to handle the new feature, including:
        • Teaching the compiler to handle the new AST node and emit corresponding bytecode instructions.
        • The Bytecode Interpreter
        • Understand the implementation of the main bytecode interpreter.
        • Modify the interpreter to handle the new bytecode instructions for the new feature.

Conclusion: By the end of the workshop, participants will have made a syntactic change to Python, understanding its impact on CPython's various components. They will leave with the knowledge needed to continue exploring CPython and potentially contribute to the project.

Prerequisites:

  • Comfort with reading and writing C (including pointers, enums, structs, preprocessor macros)
  • Familiarity with the shell. We will be running some commands such as the build commands, and the debugger from the terminal.
  • Basic understanding of compilers (e.g. having taken a compilers course, or implemented a simple compiler) will give a head start
  • Although CPython development can be done in any environment, I will only cover the setup on Linux. I don't have MacOS or Windows machines with me to figure out the setup instructions there. The participants either need to know their way around compiling C projects on their OS, or they should come prepared with a Linux VM. I am happy to provide instruction on how to setup a VM.
  • I will show everything in VS Code, and will not be able to help out with issues in other IDEs or editors.

Speaker Info:

Abhinav Upadhyay is a seasoned software engineer with a robust background in computer science. He has contributed to various open-source projects and is a dev member of the NetBSD project. Abhinav has presented at several BSD conferences, including EuroBSDCon and AsiaBSDCon. He authors "Confessions of a Code Addict" on Substack, where he explores topics like compilers, programming languages, and database internals, making complex technical subjects accessible to a broad audience. His insights are highly valued in the tech community.

Section: Core Python
Type: Workshops
Target Audience: Advanced
Last Updated: