- π Summary
- β¨ Key Features
- π¦ Installation
- π Update Jupygrader
- π Usage
- π Create an autogradable notebook
- π€ AI-Assisted Grading
- π§ Utility functions
- π License
Jupygrader is a Python package for automated grading of Jupyter notebooks. It provides a framework to:
- Execute and grade Jupyter notebooks containing student work and test cases
- Generate comprehensive reports in multiple formats (JSON, HTML, TXT)
- Extract student code from notebooks into separate Python files
- Verify notebook integrity by computing hashes of test cases and submissions
- Grade with AI assistance β use an LLM to grade manual items, review failures, or evaluate notebooks entirely without execution
- Executes notebooks in a controlled, temporary environment
- Preserves the original notebook while creating graded versions
- Adds grader scripts to notebooks to evaluate test cases
- Supports multiple grading modes:
- Automatic grading via assertions and tests
- Manual grading
- Hybrid (automatic + manual)
- AI-assisted grading (full or partial)
- Generates detailed grading results including:
- Individual test case scores
- Overall scores and summaries
- Success/failure status of each test
- Produces multiple output formats for instructors to review:
- Graded notebook (.ipynb)
- HTML report
- JSON result data
- Plaintext summary
- Extracted Python code
- Includes metadata like Python version, platform, and file hashes for verification
pip install jupygraderpip install --upgrade jupygraderfrom jupygrader import grade_notebooks
notebook_file_path = 'path/to/notebook.ipynb'
grade_notebooks([notebook_file_path])Supplying a pathlib.Path() object is supported.
from jupygrader import grade_notebooks
from pathlib import Path
notebook_path = Path('path/to/notebook.ipynb')
grade_notebooks([notebook_path])If the output_path is not specified, the output files will be stored to the same directory as the notebook file.
During grading, Jupygrader preprocesses code cells and comments out lines that start with IPython shell/magic prefixes (! and %). This prevents notebook-only commands from causing syntax errors in the Python-based grading pipeline.
from jupygrader import grade_notebooks
grade_notebooks([{
"notebook_path": 'path/to/notebook.ipynb',
"output_path": 'path/to/output'
}])The instructor authors only one "solution" notebook, which contains both the solution code and test cases for all graded parts.
Jupygrader provides a simple drag-and-drop interface to generate a student-facing notebook that removes the solution code and obfuscates test cases if required.
Any code between # YOUR CODE BEGINS and # YOUR CODE ENDS are stripped in the student version.
import pandas as pd
# YOUR CODE BEGINS
sample_series = pd.Series([-20, -10, 10, 20])
# YOUR CODE ENDS
print(sample_series)nbgrader syntax (### BEGIN SOLUTION, ### END SOLUTION) is also supported.
import pandas as pd
### BEGIN SOLUTION
sample_series = pd.Series([-20, -10, 10, 20])
### END SOLUTION
print(sample_series)In the student-facing notebook, the code cell will look like:
import pandas as pd
# YOUR CODE BEGINS
# YOUR CODE ENDS
print(sample_series)To keep setup notes or helper code in the instructor notebook only, start any cell with one of the following markers. The full cell will be removed in the generated student version:
# GRADER_ONLY(case-insensitive)# grader_only(case-insensitive)! grader_only(case-insensitive)_grader_only = True(case-sensitive; whitespace is ignored)
A graded test case requires a test case name and an assigned point value.
- The
_test_casevariable should store the name of the test case. - The
_pointsvariable should store the number of points, either as an integer or a float.
_test_case = 'create-a-pandas-series'
_points = 2
pd.testing.assert_series_equal(sample_series, pd.Series([-20, -10, 10, 20]))Mark a test case with _grade_manually = True to flag it for human (or AI) review instead of assertion-based grading.
_test_case = 'explain-your-approach'
_points = 5
_grade_manually = True
# Students write a free-response answer hereIf you want to prevent learners from seeing the test case code, you can optionally set _obfuscate = True to base64-encode the test cases.
Note that this provides only basic obfuscation, and students can easily decode the string to reveal the original code.
We may introduce an encryption method in the future.
Instructor notebook
_test_case = 'create-a-pandas-series'
_points = 2
_obfuscate = True
pd.testing.assert_series_equal(sample_series, pd.Series([-20, -10, 10, 20]))Student notebook
# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = 'create-a-pandas-series'
_points = 2
_obfuscate = True
import base64 as _b64
_64 = _b64.b64decode('cGQudGVzdGluZy5hc3NlcnRfc2VyaWVzX2VxdWFsKHNhbXBsZV9zZXJpZXMsIHBkLlNlcmllcyhbLT\
IwLCAtMTAsIDEwLCAyMF0pKQ==')
eval(compile(_64, '<string>', 'exec'))Add hidden test cases
Hidden test cases only run while grading.
_test_case = 'create-a-pandas-series'
_points = 2
### BEGIN HIDDEN TESTS
pd.testing.assert_series_equal(sample_series, pd.Series([-20, -10, 10, 20]))
### END HIDDEN TESTS_test_case = 'create-a-pandas-series'
_points = 2
if 'is_jupygrader_env' in globals():
pd.testing.assert_series_equal(sample_series, pd.Series([-20, -10, 10, 20]))Jupygrader can use an OpenAI-compatible model to assist with grading. Set the ai_mode parameter to one of the following string values:
ai_mode |
Description |
|---|---|
"off" |
No AI grading (default) |
"full" |
AI grades all test cases based on notebook content β no execution required |
"manual_only" |
AI grades test cases marked _grade_manually = True |
"review_failed" |
AI reviews auto-graded test cases that failed |
"manual_and_failed" |
AI grades both manual items and failed test cases |
Note:
openai_modelis required wheneverai_modeis not"off". Omitting it raises aValueError.
Use ai_mode="full" to have the AI evaluate every test case based solely on the notebook's content, without executing it. This is ideal for open-ended assignments, essay-style responses, or notebooks that include general instructions rather than assertion-based tests.
import openai
from jupygrader import grade_notebooks
client = openai.OpenAI(api_key="your-api-key")
results = grade_notebooks(
["submissions/student1.ipynb", "submissions/student2.ipynb"],
ai_mode="full",
openai_client=client,
openai_model="gpt-4o",
)In "full" mode, test cases are parsed directly from the notebook's source cells (no execution). Notebooks without any test case cells are still processed and output artifacts are generated.
Use ai_mode="review_failed" to have the AI explain why auto-graded test cases failed and optionally award partial credit.
import openai
from jupygrader import grade_notebooks
client = openai.OpenAI(api_key="your-api-key")
results = grade_notebooks(
["submissions/student1.ipynb", "submissions/student2.ipynb"],
ai_mode="review_failed",
openai_client=client,
openai_model="gpt-4o",
)Use ai_mode="manual_only" to have the AI grade items marked _grade_manually = True in the notebook.
import openai
from jupygrader import grade_notebooks
client = openai.OpenAI(api_key="your-api-key")
results = grade_notebooks(
["submissions/student1.ipynb", "submissions/student2.ipynb"],
ai_mode="manual_only",
openai_client=client,
openai_model="gpt-4o",
)Use ai_mode="manual_and_failed" to combine both workflows in a single pass.
import openai
from jupygrader import grade_notebooks
client = openai.OpenAI(api_key="your-api-key")
results = grade_notebooks(
["submissions/student1.ipynb", "submissions/student2.ipynb"],
ai_mode="manual_and_failed",
openai_client=client,
openai_model="gpt-4o",
)Pass custom_prompt to give the AI model additional context or grading criteria. This works with all AI grading modes.
import openai
from jupygrader import grade_notebooks
client = openai.OpenAI(api_key="your-api-key")
results = grade_notebooks(
["submissions/student1.ipynb"],
ai_mode="full",
openai_client=client,
openai_model="gpt-4o",
custom_prompt=(
"This is a data analysis assignment. "
"Award full points if the student produces a correct result, even if the approach differs. "
"Deduct points for hard-coded values."
),
)If a test case needs to be updated before grading, use the jupygrader.replace_test_case() function.
This is useful when learners have already submitted their Jupyter notebooks, but the original notebook contains an incorrect test case.
nb = nbformat.read(notebook_path, as_version=4)
jupygrader.replace_test_case(nb, 'q1', '_test_case = "q1"\n_points = 6\n\nassert my_var == 3')Below is a sample code snippet demonstrating how to replace multiple test cases using a dictionary.
nb = nbformat.read(notebook_path, as_version=4)
new_test_cases = {
'test_case_01': '_test_case = "test_case_01"\n_points = 6\n\npass',
'test_case_02': '_test_case = "test_case_02"\n_points = 3\n\npass'
}
for tc_name, new_tc_code in new_test_cases.items():
jupygrader.replace_test_case(nb, tc_name, new_tc_code)jupygrader is distributed under the terms of the MIT license.
