Unit Testing Jupyter Notebooks - testbook
Rohit Sanjay (~rohit920) |
Traditionally Jupyter Notebook users have had an excellent experience around exploring code solutions with an interactive development environment. However, the Jupyter Notebook document that’s produced -- the .ipynb file -- was not easily testable in the same variety of coding situations as other extension formats like plain .py files. For certain situations end-to-end execution tools like papermill allow for testing the entire document as one unit, but individual unittesting of code snippets from the file was difficult or impossible to achieve without exporting to a new format and refactoring.
Notebooks in recent years have exploded in popularity, with millions of notebooks on github alone. To alleviate the pain point for testing these notebooks, we created a new library called testbook. The testbook library is a unit testing framework for testing code in Jupyter Notebooks with pytest patterns.
Previous attempts at unit testing notebooks involved writing the tests in the notebook itself, which was faulty, difficult to read, and even more difficult to maintain. However, testbook allows for unit tests to be run against notebooks in separate test files, hence treating .ipynb files as .py files.
Features of the testbook library
- Write conventional unit tests for Jupyter Notebooks
- Execute all or some specific cells before unit test
- Share kernel context across multiple tests (using pytest fixtures)
- Support for patching objects
- Inject code into Jupyter notebooks
- Works with any unit testing library - unittest, pytest or nose
Outline of the talk
- Brief context about the landscape of Jupyter notebooks - 2 mins
- Context/Rationale behind creating testbook - 5 mins
- Reproducibility of explored code paths
- Reliability of re-execution with different inputs
- Notebooks moving from experimentation to production environments
- Intro to testbook and its features - 15-20 mins
- Brief demo of testbook (through images / gifs)
- How testbook works
- Walkthrough of key features of testbook
- Showing simple test functions
- Execute specific cells before test
- Share kernel context
- Support for patching objects
- When to use or not use testbook (Who is testbook for) - 2 mins
- Future/Roadmap of testbook - what we have in store for future releases - 1 min
- Use cases for education and teaching
- Better support of non-Python kernels
Who the talk is for
This talk is for anyone has ever worked with Jupyter Notebooks. Common users of notebooks are Data Scientists, Data Engineers, System Automation Engineers, and Teachers.
Basic knowledge of testing in Python. Preferred but not required to know about notebook systems.
Slides for the talk can be found here.
A shorter version of this talk was delivered as a lightning talk at SciPy 2020. The video can be found here.
About Rohit Sanjay
Rohit is a final year engineering student at MIT Manipal. He is an open source enthusiast and is a Google Summer of Code 2020 intern at NumFocus where he worked on implementing testbook from scratch with the mentorship of Matthew Seal.
About Matthew Seal
Matthew is the CTO at Noteable, a VC funded startup focusing on Notebook services. Before Noteable he was at Netflix working on orchestration and integration systems, which included novel uses of notebooks in production systems. Matt is a maintainer for a number of jupyter libraries including papermill.