Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1c091b4
Add LLM eval metric models
benglewis Jan 13, 2026
1a047bb
Format llm_behavior_eval
benglewis Jan 13, 2026
06d040c
Fix optional type hints in llm behavior eval
benglewis Jan 13, 2026
202a481
Merge branch 'main' into codex/2026-01-13/linear-mention-sdk-79-add-s…
benglewis Jan 27, 2026
415f7e3
Basic first implementation of matching naming for `check_run` instead…
benglewis Jan 28, 2026
b9027d8
Add AGENTS.md and new `dependency_groups` entry of `dev` for development
benglewis Jan 28, 2026
d141b9d
Fix `RunStatus` circular dependency
benglewis Jan 28, 2026
c62aad3
Drop unnecessary `TypeAdapter` and add error log for invalid SSE payl…
benglewis Jan 28, 2026
d42c945
Update `AGENTS.md` to use context7 and not use 1-3 character variable…
benglewis Jan 28, 2026
808aed7
Add assertion for `summary_brief` and `summary_full` to LLM behavior …
benglewis Jan 28, 2026
c872f89
Fix SSE payload parsing
benglewis Jan 28, 2026
be354c4
Try to fix `unzip` for LLM behavior eval results
benglewis Jan 28, 2026
5391f96
Apply Greptile suggestions from code review
benglewis Feb 3, 2026
7612832
SDK-79: Guard SSE progress and retry eval stream
benglewis Feb 3, 2026
bc1d309
SDK-79: Skip non-string SSE progress
benglewis Feb 3, 2026
4374468
SDK-79: Apply ruff format to llm behavior eval
benglewis Feb 3, 2026
ea7f62b
Merge branch 'main' into codex/2026-01-13/linear-mention-sdk-79-add-s…
benglewis Feb 4, 2026
6bc3126
Update `README.md` and documentation (docs)
benglewis Feb 4, 2026
263f19f
Drop Python code from `README.md`
benglewis Feb 4, 2026
6a96100
Fix circular import
benglewis Feb 4, 2026
97b2431
Rename `BiasType` to `BBQBiasType`
benglewis Feb 4, 2026
ba0392a
Fix `progress_bar` not being closed if there is an error with the run
benglewis Feb 4, 2026
beb20a6
Add `deleted_at` to `EvalRunRecord`
benglewis Feb 4, 2026
56b538f
Add cleanup for LLM behavior eval runs
benglewis Feb 4, 2026
6c641ec
Add `UnqoverBiasType` as per @mishana 's PR comment
benglewis Feb 4, 2026
c1eba15
Fix Cursor's bugbot's comment
benglewis Feb 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .envrc
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
watch_file uv.lock
uv sync --all-extras && source .venv/bin/activate
uv sync --group dev && source .venv/bin/activate
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,4 @@ repos:
hooks:
- id: uv-lock
- id: uv-sync
args: ["--extra", "dev", "--extra", "docs", "--extra", "pandas", "--extra", "polars", "--extra", "transformers"]
args: ["--group", "dev"]
47 changes: 47 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Repository Guidelines

## Instructions

- Always use context7 when I need code generation, setup or configuration steps, or
library/API documentation. This means you should automatically use the Context7 MCP
tools to resolve library id and get library docs without me having to explicitly ask.

## Project Structure & Module Organization

- `hirundo/` holds the SDK source (CLI entry point is `hirundo.cli:app`).
- `tests/` contains pytest-based test coverage.
- `docs/` and `source/` contain Sphinx documentation assets.
- `notebooks/` and `on_prem_test_notebook.ipynb` provide example workflows.
- `requirements/` stores compiled dependency sets (for dev, docs, pandas, polars, transformers).

## Build, Test, and Development Commands

- `uv sync --group dev`: fast dependency sync with extras.
- `ruff check` / `ruff format`: lint and auto-format (run before PRs).
- `pytest`: run the test suite.
- `python -m build`: build the package artifacts.
- `pre-commit install`: enable git hooks (optional, but recommended).

## Coding Style & Naming Conventions

- Python 3.10+ codebase, 4-space indentation, line length 88 (Ruff defaults).
- Follow Ruff linting rules (`pyproject.toml`), with tests allowing `assert` usage.
- Prefer descriptive names; avoid short, cryptic identifiers in new code.
- Avoid 1-3 character variable names in new or refactored code. Use descriptive names
even in small scopes.

## Testing Guidelines

- Frameworks: `pytest` and `pytest-asyncio`.
- Place tests in `tests/`; name files `test_*.py`.
- Run locally with `pytest` before opening a PR (CI runs lint + integration tests).

## Commit & Pull Request Guidelines

- Recent commit history favors `SDK-<id>: <summary>` (e.g., `SDK-78: Migrate to basedpyright`).
- Include issue/PR references when available (e.g., `(#190)`).
- PRs should describe changes clearly and confirm `ruff check` and `ruff format` passed.

## Security & Configuration Tips

- Supported Python versions: CPython 3.10–3.13.
120 changes: 6 additions & 114 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
The Hirundo Python SDK lets you:

- Launch and monitor LLM behavior unlearning runs.
- Run LLM behavior evaluations for bias, hallucination, and prompt injection.
- Run dataset QA for ML datasets (classification, object detection, and more).
- Fetch QA results as `pandas` or `polars` DataFrames.

Expand All @@ -22,7 +23,7 @@ pip install hirundo
Optional extras:

- LLM behavior unlearning (Transformers + PEFT): `pip install hirundo[transformers]`
- Dataset QA results as DataFrames: `pip install hirundo[pandas]` or `pip install hirundo[polars]`
- Dataset QA or LLM behavior eval results as DataFrames: `pip install hirundo[pandas]` or `pip install hirundo[polars]`

If you want to install from source, clone this repository and run:

Expand All @@ -40,120 +41,11 @@ hirundo setup

This writes `API_KEY` (and optionally `API_HOST`) to `.env` in the current directory or `~/.hirundo.conf`.

## Quickstart: LLM behavior unlearning

Make sure you have the `transformers` extra installed (`pip install hirundo[transformers]`).

```python
from hirundo import (
BiasRunInfo,
BiasType,
HuggingFaceTransformersModel,
LlmModel,
LlmUnlearningRun,
)

llm = LlmModel(
model_name="Nemotron-Flash-1B",
model_source=HuggingFaceTransformersModel(
model_name="nvidia/Nemotron-Flash-1B",
),
)
llm_id = llm.create()

run_id = LlmUnlearningRun.launch(
llm_id,
BiasRunInfo(bias_type=BiasType.ALL),
)

result = LlmUnlearningRun.check_run(run_id)
new_adapter = llm.get_hf_pipeline_for_run(run_id)
```

## Quickstart: Dataset QA

### Classification

```python
import json
import os

from hirundo import (
HirundoCSV,
LabelingType,
QADataset,
StorageConfig,
StorageGCP,
StorageTypes,
)

gcp_bucket = StorageGCP(
bucket_name="cifar100bucket",
project="Hirundo-global",
credentials_json=json.loads(os.environ["GCP_CREDENTIALS"]),
)

test_dataset = QADataset(
name="TEST-GCP cifar 100 classification dataset",
labeling_type=LabelingType.SINGLE_LABEL_CLASSIFICATION,
storage_config=StorageConfig(
name="cifar100bucket",
type=StorageTypes.GCP,
gcp=gcp_bucket,
),
data_root_url=gcp_bucket.get_url(path="/pytorch-cifar/data"),
labeling_info=HirundoCSV(
csv_url=gcp_bucket.get_url(path="/pytorch-cifar/data/cifar100.csv"),
),
classes=cifar100_classes,
)

test_dataset.run_qa()
results = test_dataset.check_run()
print(results)
```
## Quickstart examples

### Object detection

```python
from hirundo import (
GitRepo,
HirundoCSV,
LabelingType,
QADataset,
StorageConfig,
StorageGit,
StorageTypes,
)

git_storage = StorageGit(
repo=GitRepo(
name="BDD-100k-validation-dataset",
repository_url="https://huggingface.co/datasets/hirundo-io/bdd100k-validation-only",
),
branch="main",
)

test_dataset = QADataset(
name="TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset",
labeling_type=LabelingType.OBJECT_DETECTION,
storage_config=StorageConfig(
name="BDD-100k-validation-dataset",
type=StorageTypes.GIT,
git=git_storage,
),
data_root_url=git_storage.get_url(path="/BDD100K Val from Hirundo.zip/bdd100k"),
labeling_info=HirundoCSV(
csv_url=git_storage.get_url(
path="/BDD100K Val from Hirundo.zip/bdd100k/bdd100k.csv"
),
),
)

test_dataset.run_qa()
results = test_dataset.check_run()
print(results)
```
The full quickstart examples now live in the Sphinx docs so they can be linted,
formatted, and type-checked as real Python files. See the examples embedded in
`docs/index.rst`, which are sourced from `docs/*.py` files.

## Supported dataset storage

Expand Down
41 changes: 41 additions & 0 deletions docs/dataset_qa_object_detection_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""Examples for docs/index.rst literalinclude blocks."""

from hirundo import (
GitRepo,
HirundoCSV,
LabelingType,
QADataset,
StorageConfig,
StorageGit,
StorageTypes,
)

git_storage = StorageGit(
repo=GitRepo(
name="BDD-100k-validation-dataset",
repository_url=(
"https://huggingface.co/datasets/hirundo-io/bdd100k-validation-only"
),
),
branch="main",
)

test_dataset = QADataset(
name="TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset",
labeling_type=LabelingType.OBJECT_DETECTION,
storage_config=StorageConfig(
name="BDD-100k-validation-dataset",
type=StorageTypes.GIT,
git=git_storage,
),
data_root_url=git_storage.get_url(path="/BDD100K Val from Hirundo.zip/bdd100k"),
labeling_info=HirundoCSV(
csv_url=git_storage.get_url(
path="/BDD100K Val from Hirundo.zip/bdd100k/bdd100k.csv"
),
),
)

test_dataset.run_qa()
results = test_dataset.check_run()
print(results)
10 changes: 10 additions & 0 deletions docs/hirundo.llm_behavior_eval.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. meta::
:http-equiv=Content-Security-Policy: default-src 'self', frame-ancestors 'none'

hirundo.llm_behavior_eval module
=============================

.. automodule:: hirundo.llm_behavior_eval
:members:
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/hirundo.llm_behavior_eval_results.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. meta::
:http-equiv=Content-Security-Policy: default-src 'self', frame-ancestors 'none'

hirundo.llm_behavior_eval_results module
=============================

.. automodule:: hirundo.llm_behavior_eval_results
:members:
:undoc-members:
:show-inheritance:
10 changes: 10 additions & 0 deletions docs/hirundo.llm_bias_type.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. meta::
:http-equiv=Content-Security-Policy: default-src 'self', frame-ancestors 'none'

hirundo.llm_bias_type module
=============================

.. automodule:: hirundo.llm_bias_type
:members:
:undoc-members:
:show-inheritance:
3 changes: 3 additions & 0 deletions docs/hirundo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ Submodules
hirundo.git
hirundo.labeling
hirundo.logger
hirundo.llm_behavior_eval
hirundo.llm_behavior_eval_results
hirundo.llm_bias_type
hirundo.storage
hirundo.unlearning_llm
hirundo.unzip
Expand Down
21 changes: 19 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Welcome to the ``hirundo`` client library documentation. This SDK connects to th
Hirundo platform and provides APIs for:

- LLM behavior unlearning runs (reducing bias, prompt injections and other unwanted behaviors).
- LLM behavior eval runs (measuring bias, hallucination, prompt injection, and more).
- Dataset QA for machine learning datasets.

Getting started
Expand Down Expand Up @@ -45,6 +46,17 @@ Example:
.. literalinclude:: llm_unlearning_example.py
:language: python

LLM behavior eval
-----------------

Run standardized evaluations over an LLM or an unlearning run to quantify
behavior changes (bias, hallucination, prompt injections, and more).

Example:

.. literalinclude:: llm_behavior_eval_example.py
:language: python

Dataset QA
----------

Expand All @@ -63,9 +75,14 @@ Supported storage backends include:
- Google Cloud Storage (GCS)
- Git repositories with LFS (GitHub, Hugging Face)

Example:
Classification example:

.. literalinclude:: dataset_qa_classification_example.py
:language: python

Object detection example:

.. literalinclude:: dataset_qa_example.py
.. literalinclude:: dataset_qa_object_detection_example.py
:language: python

API reference
Expand Down
32 changes: 32 additions & 0 deletions docs/llm_behavior_eval_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""Examples for docs/index.rst literalinclude blocks."""

from hirundo import (
BBQBiasType,
EvalRunInfo,
HuggingFaceTransformersModel,
LlmBehaviorEval,
LlmModel,
ModelOrRun,
PresetType,
)

llm = LlmModel(
model_name="Nemotron-Flash-1B",
model_source=HuggingFaceTransformersModel(
model_name="nvidia/Nemotron-Flash-1B",
),
)
llm_id = llm.create()

run_id = LlmBehaviorEval.launch_eval_run(
ModelOrRun.MODEL,
EvalRunInfo(
name="Nemotron BBQ bias eval",
model_id=llm_id,
preset_type=PresetType.BBQ_BIAS,
bias_type=BBQBiasType.ALL,
),
)

results = LlmBehaviorEval.check_run_by_id(run_id)
print(results.summary_brief)
4 changes: 2 additions & 2 deletions docs/llm_unlearning_example.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""Examples for docs/index.rst literalinclude blocks."""

from hirundo import (
BBQBiasType,
BiasRunInfo,
BiasType,
HuggingFaceTransformersModel,
LlmModel,
LlmUnlearningRun,
Expand All @@ -17,7 +17,7 @@
llm_id = llm.create()
run_id = LlmUnlearningRun.launch(
llm_id,
BiasRunInfo(bias_type=BiasType.ALL),
BiasRunInfo(bias_type=BBQBiasType.ALL),
)
result = LlmUnlearningRun.check_run(run_id)
new_adapter = llm.get_hf_pipeline_for_run(run_id)
Loading
Loading