Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
28b6c1c
Refine LLM unlearning models
benglewis Jan 13, 2026
96e8137
Move `get_unique_id` to a new `testing_utils` file
benglewis Jan 13, 2026
ea21669
Add small / Smol test
benglewis Jan 13, 2026
a3737df
Switch to `nvidia/Nemotron-Flash-1B` since it turns out that SmolLM2-…
benglewis Jan 13, 2026
4c968f0
Fix unlearning LLM run typing
benglewis Jan 13, 2026
493191e
Refine LLM run list typing
benglewis Jan 13, 2026
32c2096
Type unlearning run list responses
benglewis Jan 13, 2026
0349fae
Handle empty target utilities explicitly
benglewis Jan 13, 2026
418b923
Merge remote-tracking branch 'origin/codex/2026-01-13/linear-mention-…
benglewis Jan 13, 2026
8fcbda0
Merge remote-tracking branch 'origin/codex/2026-01-13/linear-mention-…
benglewis Jan 13, 2026
b062346
Merge branch 'main' into codex/2026-01-13/linear-mention-sdk-43-add-s…
benglewis Jan 13, 2026
b1b1823
Fix Ruff lint errors and basedpyright errors
benglewis Jan 13, 2026
ec89380
Add `check_run` and `acheck_run` (and `check_run_by_id` and `acheck_r…
benglewis Jan 14, 2026
be58f5d
Reduce duplicate code
benglewis Jan 15, 2026
17f756e
LLM behavior unlearning test and loading of transformers Pipeline
benglewis Jan 21, 2026
fb991d6
Fix `numpy` version error
benglewis Jan 21, 2026
5eea81a
Fix requirements files
benglewis Jan 22, 2026
e10613e
Fix retries when HTTP SSE request fails
benglewis Jan 22, 2026
898c7f8
Fix Misha's PR comment
benglewis Jan 22, 2026
9b42aa5
Fix ChatGPT PR review comments
benglewis Jan 22, 2026
785155b
Fix Gemini's PR comment
benglewis Jan 22, 2026
d02a588
Fix bug in `deleted_at` field for `OutputUnlearningLlmRun`
benglewis Jan 22, 2026
73d8597
Rename `get_pipeline_for_run` to `get_hf_pipeline_for_run`
benglewis Jan 22, 2026
1bd8c92
Fix `MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES` being used instead o…
benglewis Jan 22, 2026
efa2e01
Fix pydantic error due to class not really being imported at runtime
benglewis Jan 22, 2026
47c7acf
Fix `deleted_at` being missing crashing the tests
benglewis Jan 22, 2026
3eabf6c
Switch to `API_HOST2` and `API_KEY2` for now until `test` contains LL…
benglewis Jan 22, 2026
60ebe02
Fix unlearning LLM behavior test running on every platform
benglewis Jan 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
python -m pip install --upgrade pip
python -m venv .venv
source .venv/bin/activate
pip install -r requirements/dev.txt -r requirements/pandas.txt -r requirements/polars.txt
pip install -r requirements/dev.txt -r requirements/pandas.txt -r requirements/polars.txt -r requirements/transformers.txt
- run: echo "$PWD/.venv/bin" >> $GITHUB_PATH
- uses: astral-sh/ruff-action@v3
- run: ruff check
Expand All @@ -50,6 +50,6 @@ jobs:
python -m pip install --upgrade pip
python -m venv .venv
source .venv/bin/activate
pip install -r requirements/dev.txt -r requirements/pandas.txt -r requirements/polars.txt
pip install -r requirements/dev.txt -r requirements/pandas.txt -r requirements/polars.txt -r requirements/transformers.txt
- run: echo "$PWD/.venv/bin" >> $GITHUB_PATH
- run: basedpyright
2 changes: 1 addition & 1 deletion .github/workflows/pytest-full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
python -m pip install --upgrade pip
python -m venv .venv
source .venv/bin/activate
pip install -r requirements/dev.txt -r requirements/polars.txt
pip install -r requirements/dev.txt -r requirements/polars.txt -r requirements/transformers.txt
- name: Run PyTest
run: .venv/bin/pytest tests/${{ matrix.data-qa-test['test'] }}
env:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/pytest-sanity.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,20 +56,20 @@ jobs:
python -m pip install --upgrade pip
python -m venv .venv
source .venv/bin/activate
pip install -r requirements/dev.txt -r requirements/polars.txt
pip install -r requirements/dev.txt -r requirements/polars.txt -r requirements/transformers.txt
- name: Run commands on Windows
if: github.event_name != 'pull_request' && runner.os == 'Windows' && steps.changes.outputs.non_workflow == 'true'
run: |
python -m pip install --upgrade 'pip>=24.1.2'
python -m venv .venv
.venv\Scripts\activate
python -m pip install -r requirements\dev.txt -r requirements\polars.txt
python -m pip install -r requirements\dev.txt -r requirements\polars.txt -r requirements\transformers.txt
- name: Run PyTest on Linux and macOS
if: github.event_name != 'pull_request' && runner.os != 'Windows' && steps.changes.outputs.non_workflow == 'true'
run: .venv/bin/pytest
env:
API_HOST: ${{ secrets.API_HOST }}
API_KEY: ${{ secrets.API_KEY }}
API_HOST: ${{ secrets.API_HOST2 }}
API_KEY: ${{ secrets.API_KEY2 }}
GCP_CREDENTIALS: ${{ secrets.GCP_CREDENTIALS }}
AWS_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Expand All @@ -80,8 +80,8 @@ jobs:
if: github.event_name != 'pull_request' && runner.os == 'Windows' && steps.changes.outputs.non_workflow == 'true'
run: .venv/Scripts/pytest
env:
API_HOST: ${{ secrets.API_HOST }}
API_KEY: ${{ secrets.API_KEY }}
API_HOST: ${{ secrets.API_HOST2 }}
API_KEY: ${{ secrets.API_KEY2 }}
GCP_CREDENTIALS: ${{ secrets.GCP_CREDENTIALS }}
AWS_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/vulnerability-scan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:
- requirements/docs.txt
- requirements/pandas.txt
- requirements/polars.txt
- requirements/transformers.txt
runs-on: ubuntu-latest
permissions:
contents: read
Expand Down
15 changes: 11 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,18 @@ repos:
always_run: false
files: pyproject.toml$
additional_dependencies: [uv]
- id: pip-sync
name: sync
- id: pip-compile-transformers
name: compile requirements/transformers.txt
entry: uv
args: ["pip", "sync", "requirements/dev.txt", "requirements/docs.txt", "requirements/pandas.txt", "requirements/polars.txt"]
args: ["pip", "compile", "--extra", "transformers", "-o", "requirements/transformers.txt", "-c", "requirements/requirements.txt"]
language: python
always_run: false
files: requirements.txt$
files: pyproject.toml$
additional_dependencies: [uv]
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.9.6
hooks:
- id: uv-lock
- id: uv-sync
args: ["--extra", "dev", "--extra", "docs", "--extra", "pandas", "--extra", "polars", "--extra", "transformers"]
8 changes: 7 additions & 1 deletion DEV_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,18 @@ uv pip compile --extra dev -o requirements/dev.txt -c requirements.txt pyproject
uv pip compile --extra pandas -o requirements/pandas.txt -c requirements.txt pyproject.toml
uv pip compile --extra polars -o requirements/polars.txt -c requirements.txt pyproject.toml
uv pip compile --extra docs -o requirements/docs.txt -c requirements.txt pyproject.toml
uv pip compile --extra transformers -o requirements/transformers.txt -c requirements.txt pyproject.toml
```

#### Sync installed packages

```bash
uv pip sync requirements/dev.txt requirements/polars.txt
uv pip sync requirements/dev.txt requirements/pandas.txt requirements/polars.txt requirements/docs.txt requirements/transformers.txt
```
or

```bash
uv sync --extra dev --extra pandas --extra polars --extra docs --extra transformers
```

### Build process
Expand Down
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,31 @@ You can install the codebase with a simple `pip install hirundo` to install the

## Usage

Classification example:
### Unlearning LLM behavior

Make sure to install the `transformers` extra, i.e. `pip install hirundo[transformers]` or `uv pip install hirundo[transformers]` if you have `uv` installed which is much faster than `pip`.
Comment thread
mishana marked this conversation as resolved.

```python
llm = LlmModel(
model_name="Nemotron-Flash-1B",
model_source=HuggingFaceTransformersModel(
model_name="nvidia/Nemotron-Flash-1B",
),
)
llm_id = llm.create()
run_info = BiasRunInfo(
bias_type=BiasType.ALL,
)
run_id = LlmUnlearningRun.launch(
llm_id,
run_info,
)
new_adapter = llm.get_hf_pipeline_for_run(run_id)
```

### Dataset QA

#### Classification example:

```python
from hirundo import (
Expand Down Expand Up @@ -104,7 +128,7 @@ results = test_dataset.check_run()
print(results)
```

Object detection example:
#### Object detection example:

```python
from hirundo import (
Expand Down
1 change: 1 addition & 0 deletions dev.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ COPY . .
RUN pip install -r requirements/requirements.txt \
-r requirements/dev.txt -r requirements/docs.txt \
-r requirements/pandas.txt -r requirements/polars.txt \
-r requirements/transformers.txt \
&& pip install ipykernel

CMD ["python"]
18 changes: 17 additions & 1 deletion hirundo/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@
StorageGit,
StorageS3,
)
from .unlearning_llm import (
BiasRunInfo,
BiasType,
HuggingFaceTransformersModel,
LlmModel,
LlmSources,
LlmUnlearningRun,
LocalTransformersModel,
)
from .unzip import load_df, load_from_zip

__all__ = [
Expand Down Expand Up @@ -59,8 +68,15 @@
"StorageGit",
"StorageConfig",
"DatasetQAResults",
"BiasRunInfo",
"BiasType",
"HuggingFaceTransformersModel",
"LlmModel",
"LlmSources",
"LlmUnlearningRun",
"LocalTransformersModel",
"load_df",
"load_from_zip",
]

__version__ = "0.1.21"
__version__ = "0.1.22"
153 changes: 153 additions & 0 deletions hirundo/_llm_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
import importlib.util
import tempfile
import zipfile
from pathlib import Path
from typing import TYPE_CHECKING, cast

from hirundo import HirundoError
from hirundo._http import requests
from hirundo._timeouts import DOWNLOAD_READ_TIMEOUT
from hirundo.logger import get_logger

if TYPE_CHECKING:
from torch import device as torch_device
from transformers.configuration_utils import PretrainedConfig
from transformers.modeling_utils import PreTrainedModel
from transformers.pipelines.base import Pipeline

from hirundo.unlearning_llm import LlmModel, LlmModelOut

logger = get_logger(__name__)


ZIP_FILE_CHUNK_SIZE = 50 * 1024 * 1024 # 50 MB
REQUIRED_PACKAGES_FOR_PIPELINE = ["peft", "transformers", "accelerate"]


def get_hf_pipeline_for_run_given_model(
llm: "LlmModel | LlmModelOut",
run_id: str,
config: "PretrainedConfig | None" = None,
device: "str | int | torch_device | None" = None,
device_map: str | dict[str, int | str] | None = None,
trust_remote_code: bool = False,
token: str | None = None,
) -> "Pipeline":
for package in REQUIRED_PACKAGES_FOR_PIPELINE:
if importlib.util.find_spec(package) is None:
raise HirundoError(
f'{package} is not installed. Please install transformers extra with pip install "hirundo[transformers]"'
)
from peft import PeftModel
from transformers.models.auto.configuration_auto import AutoConfig
from transformers.models.auto.modeling_auto import (
MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES,
AutoModelForCausalLM,
AutoModelForImageTextToText,
)
from transformers.models.auto.tokenization_auto import AutoTokenizer
from transformers.pipelines import pipeline

from hirundo.unlearning_llm import (
HuggingFaceTransformersModel,
HuggingFaceTransformersModelOutput,
LlmUnlearningRun,
)

run_results = LlmUnlearningRun.check_run_by_id(run_id)
if run_results is None:
raise HirundoError("No run results found")
result_payload = (
run_results.get("result", run_results)
if isinstance(run_results, dict)
else run_results
)
if isinstance(result_payload, dict):
result_url = result_payload.get("result")
else:
result_url = result_payload
if not isinstance(result_url, str):
raise HirundoError("Run results did not include a download URL")
# Stream the zip file download

zip_file_path = tempfile.NamedTemporaryFile(delete=False).name
with requests.get(
result_url,
timeout=DOWNLOAD_READ_TIMEOUT,
stream=True,
) as r:
r.raise_for_status()
with open(zip_file_path, "wb") as zip_file:
for chunk in r.iter_content(chunk_size=ZIP_FILE_CHUNK_SIZE):
zip_file.write(chunk)
logger.info(
"Successfully downloaded the result zip file for run ID %s to %s",
run_id,
zip_file_path,
)

with tempfile.TemporaryDirectory() as temp_dir:
temp_dir_path = Path(temp_dir)
with zipfile.ZipFile(zip_file_path, "r") as zip_file:
zip_file.extractall(temp_dir_path)
# Attempt to load the tokenizer normally
base_model_name = (
llm.model_source.model_name
if isinstance(
llm.model_source,
HuggingFaceTransformersModel | HuggingFaceTransformersModelOutput,
)
else llm.model_source.local_path
)
token = (
llm.model_source.token
if isinstance(
llm.model_source,
HuggingFaceTransformersModel,
)
else token
)
tokenizer = AutoTokenizer.from_pretrained(
base_model_name,
token=token,
trust_remote_code=trust_remote_code,
)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
config = AutoConfig.from_pretrained(
base_model_name,
token=token,
trust_remote_code=trust_remote_code,
)
config_dict = config.to_dict() if hasattr(config, "to_dict") else config
is_multimodal = (
config_dict.get("model_type")
in MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES.keys()
)
if is_multimodal:
base_model = AutoModelForImageTextToText.from_pretrained(
base_model_name,
token=token,
trust_remote_code=trust_remote_code,
)
else:
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
token=token,
trust_remote_code=trust_remote_code,
)
model = cast(
"PreTrainedModel",
PeftModel.from_pretrained(
base_model, str(temp_dir_path / "unlearned_model_folder")
),
)

return pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
config=config,
device=device,
device_map=device_map,
)
Loading
Loading