Contributing

Guide for contributing to the Braintrust Python SDK.

Setup

Prerequisites

Python 3.10+
mise for tool installation and repo-local environment management

Getting Started

git clone https://github.com/braintrustdata/braintrust-sdk-python.git
cd braintrust-sdk-python
mise install
make develop

If you use mise activate in your shell, entering the repo will automatically expose the configured tools. If you do not, you can still run commands explicitly with mise exec -- ....

Repo Layout

py/: main Python SDK
py/pyproject.toml: single source of truth for package metadata, dependency groups, and provider version matrix
py/uv.lock: committed lockfile for reproducible auxiliary dep resolution
integrations/: separate integration packages such as LangChain and ADK
docs/: supporting docs

Most SDK changes should happen under py/. There is no setup.py — pyproject.toml is the build configuration.

Common Workflows

Python SDK

cd py
make install-dev
make test-core
make lint
nox -l

make install-dev uses uv sync under the hood, reading dependency groups from py/pyproject.toml. There are no separate requirements-*.txt files.

Run a focused session:

cd py
nox -s "test_openai(latest)"

Run a single test subset:

cd py
nox -s "test_openai(latest)" -- -k "test_chat_metrics"

Optional provider packages are installed automatically by each nox session — there is no separate install-optional target.

Repo-Level Commands

The root Makefile is a convenience wrapper around py/Makefile.

Useful root commands:

make fixup
make test-core
make lint

make test-wheel requires a built wheel first.

Integration Packages

LangChain:

cd integrations/langchain-py
uv sync
uv run pytest src

ADK:

cd integrations/adk-py
uv sync
uv run pytest

Testing Notes

The SDK uses nox for compatibility testing across optional providers and versions. Provider version pins live in py/pyproject.toml under [tool.braintrust.matrix]; the noxfile reads them at import time. py/noxfile.py is the source of truth for available sessions and their auxiliary deps.

VCR Tests

Many wrapper and devserver tests use VCR cassettes.

Locally, missing cassettes can be recorded with record_mode="once".
In CI, missing cassettes fail because record_mode="none" is used.
If your change intentionally changes HTTP behavior, re-record the affected cassettes and commit them.

Integration cassettes are stored in per-version subdirectories (e.g. cassettes/latest/, cassettes/1.71.0/). Nox sessions set BRAINTRUST_TEST_PACKAGE_VERSION automatically so cassettes land in the correct directory when recording. The shared py/src/braintrust/integrations/conftest.py resolves the version-specific path; individual test files do not need their own vcr_cassette_dir fixtures.

Useful example:

cd py
nox -s "test_openai(latest)" -- --vcr-record=all -k "test_openai_chat_metrics"

Claude Agent SDK Subprocess Cassettes

claude_agent_sdk tests use the real SDK and bundled claude CLI, but they do not use VCR. Instead they record and replay the SDK/CLI JSON transport under:

py/src/braintrust/integrations/claude_agent_sdk/cassettes/<version>/

Like HTTP VCR cassettes, Claude Agent SDK cassettes are stored in per-version subdirectories. The BRAINTRUST_TEST_PACKAGE_VERSION env var (set by nox) selects the correct directory.

Behavior:

Locally, subprocess cassettes default to once.
In CI, subprocess cassettes default to none.
Override with BRAINTRUST_CLAUDE_AGENT_SDK_RECORD_MODE=all when you need to re-record.

Useful examples:

cd py
nox -s "test_claude_agent_sdk(latest)"
BRAINTRUST_CLAUDE_AGENT_SDK_RECORD_MODE=all nox -s "test_claude_agent_sdk(latest)"
BRAINTRUST_CLAUDE_AGENT_SDK_RECORD_MODE=all \
  nox -s "test_claude_agent_sdk(latest)" -- -k "test_calculator_with_multiple_operations"

Type Tests

py/src/braintrust/type_tests/ contains tests that are checked by pyright, mypy, and pytest. The test_types nox session runs all three and is included in CI automatically.

When changing generic type signatures (e.g., Eval, EvalCase, EvalScorer, EvalHooks), add or update a test file in type_tests/ to verify the type checker accepts the intended usage patterns. Test files are named test_*.py, use absolute imports (from braintrust.framework import ...), and double as regular pytest files.

cd py
nox -s test_types

Fixtures

Shared test fixtures live in py/src/braintrust/conftest.py.

Common ones include:

dummy API key setup for VCR-backed tests
Braintrust global state reset between tests
wheel-mode skipping for VCR tests

The memory_logger fixture from braintrust.test_helpers is useful for asserting on logged spans without a real Braintrust backend.

Benchmarks

The SDK includes local performance benchmarks powered by pyperf, located in py/benchmarks/. These cover hot paths like serialization and deep-copy routines.

Running benchmarks

cd py

# Run all benchmarks
make bench

# Quick sanity check (fewer iterations)
make bench BENCH_ARGS="--fast"

# Save results for later comparison
make bench BENCH_ARGS="-o /tmp/results.json"

# Run a single benchmark module directly
python -m benchmarks.benches.bench_bt_json

To benchmark with the optional orjson fast-path installed:

cd py
uv sync --extra performance
make bench

Comparing across branches

cd py

git checkout main
make bench BENCH_ARGS="-o /tmp/main.json"

git checkout my-branch
make bench BENCH_ARGS="-o /tmp/branch.json"

make bench-compare BENCH_BASE=/tmp/main.json BENCH_NEW=/tmp/branch.json

Useful pyperf flags

Flag	Purpose
`--fast`	Fewer iterations — good for a quick sanity check
`--rigorous`	More iterations — reduces noise for final numbers
`-o FILE`	Write results to a JSON file for later comparison
`--append FILE`	Append to an existing results file

Run python -m benchmarks --help for the full list.

Adding a new benchmark

Drop a new bench_<name>.py file into py/benchmarks/benches/. It will be picked up automatically — no registration required.

Your module needs to expose a main() function that accepts an optional pyperf.Runner:

import pyperf

from benchmarks._utils import disable_pyperf_psutil


def main(runner: pyperf.Runner | None = None) -> None:
    if runner is None:
        disable_pyperf_psutil()
        runner = pyperf.Runner()

    runner.bench_func("my_benchmark", my_func, my_arg)


if __name__ == "__main__":
    main()

If your benchmark needs reusable test data, add builder functions to py/benchmarks/fixtures.py.

CI

GitHub Actions workflows live in .github/workflows/.

Main workflows:

checks.yaml: merged SDK checks workflow, including lint, pinned-action validation, the Python test matrix, wheel build, and the checks-passed required-check aggregator
langchain-py-test.yaml: LangChain integration tests
adk-py-test.yaml: ADK integration tests
publish-py-sdk.yaml: PyPI release
test-publish-py-sdk.yaml: TestPyPI release validation

CI uses committed HTTP VCR cassettes and Claude Agent SDK subprocess cassettes, so forks do not need provider API secrets for normal replayed test runs.

Submitting Changes

Make your change in the narrowest relevant area.
Add or update tests.
Re-record HTTP or Claude Agent SDK subprocess cassettes if the provider interaction change is intentional.
Run the smallest relevant local checks first, then broader ones if needed.
Run make fixup before opening a PR.
Open a pull request against main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing

Setup

Prerequisites

Getting Started

Repo Layout

Common Workflows

Python SDK

Repo-Level Commands

Integration Packages

Testing Notes

VCR Tests

Claude Agent SDK Subprocess Cassettes

Type Tests

Fixtures

Benchmarks

Running benchmarks

Comparing across branches

Useful pyperf flags

Adding a new benchmark

CI

Submitting Changes

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Setup

Prerequisites

Getting Started

Repo Layout

Common Workflows

Python SDK

Repo-Level Commands

Integration Packages

Testing Notes

VCR Tests

Claude Agent SDK Subprocess Cassettes

Type Tests

Fixtures

Benchmarks

Running benchmarks

Comparing across branches

Useful pyperf flags

Adding a new benchmark

CI

Submitting Changes