A Crash Course in CPython Internals
Abhinav Upadhyay (~abhinav-upadhyay) |
1
Description:
Objective: The goal of this workshop is to guide attendees from cloning the CPython repository from GitHub to understanding the key components of its architecture. By introducing a new language feature and modifying the CPython codebase to support it, attendees will gain an intuitive grasp of the key components of the CPython internals
Description: The CPython codebase is extensive and complex, and while it's impossible to cover all of it in just 3 hours, this workshop aims to provide a foundational understanding of its architecture. Participants will leave with the skills to navigate the codebase, make modifications, and maybe even start contributing to Python.
Proposed Outline:
Setup (_30 minutes_)
- Cloning the CPython Repo and Setting Up VS Code
- Guide participants through cloning the official CPython repository and configuring VS Code for code exploration.
- Understanding the Code Organization
- Building CPython
- Overview of relevant build flags for CPython development, followed by a build process.
- Running and Debugging CPython
- Demonstrate attaching
gdb
to a running Python process and basic debugging techniques.
- Demonstrate attaching
- Cloning the CPython Repo and Setting Up VS Code
Background on Bytecode Compiled Languages (_30 minutes_)
- Overview of Bytecode Compiled Languages
- Present a high-level view of how bytecode compiled languages work.
- Stages of a Simple Program: Tokenizing, Parsing, Compiling, Bytecode Execution
- Walkthrough of each stage using a simple example.
- Provide context and refresh knowledge of concepts like stack frames and bytecode representation.
- Overview of Bytecode Compiled Languages
Learning CPython Internals by Adding Our Own Language Feature (_2 hours_)
- Introducing a New Language Feature
- Guide participants through the steps required to introduce a new syntactic feature in Python.
- The Lexer
- Explanation of lexer functionality and its implementation.
- Adding a token for the new feature.
- The Parser
- Discussion on parser grammar.
- Teach the parser to understand and parse the new feature.
- The Compiler
- Overview of compiler implementation and how it processes common Abstract Syntax Tree (AST) nodes.
- Introduce modifications to handle the new feature, including:
- Teaching the compiler to handle the new AST node and emit corresponding bytecode instructions.
- The Bytecode Interpreter
- Understand the implementation of the main bytecode interpreter.
- Modify the interpreter to handle the new bytecode instructions for the new feature.
- Guide participants through the steps required to introduce a new syntactic feature in Python.
- Introducing a New Language Feature
Conclusion: By the end of the workshop, participants will have made a syntactic change to Python, understanding its impact on CPython's various components. They will leave with the knowledge needed to continue exploring CPython and potentially contribute to the project.
Prerequisites:
- Comfort with reading and writing C (including pointers, enums, structs, preprocessor macros)
- Familiarity with the shell. We will be running some commands such as the build commands, and the debugger from the terminal.
- Basic understanding of compilers (e.g. having taken a compilers course, or implemented a simple compiler) will give a head start
- Although CPython development can be done in any environment, I will only cover the setup on Linux. I don't have MacOS or Windows machines with me to figure out the setup instructions there. The participants either need to know their way around compiling C projects on their OS, or they should come prepared with a Linux VM. I am happy to provide instruction on how to setup a VM.
- I will show everything in VS Code, and will not be able to help out with issues in other IDEs or editors.
Speaker Info:
Abhinav Upadhyay is a seasoned software engineer with a robust background in computer science. He has contributed to various open-source projects and is a dev member of the NetBSD project. Abhinav has presented at several BSD conferences, including EuroBSDCon and AsiaBSDCon. He authors "Confessions of a Code Addict" on Substack, where he explores topics like compilers, programming languages, and database internals, making complex technical subjects accessible to a broad audience. His insights are highly valued in the tech community.