Skip to content

SDK-89: Validate test LLM and judge LLM can be accessed from HuggingFace#214

Open
benglewis wants to merge 12 commits into
mainfrom
codex/2026-02-11/linear-mention-sdk-89-validate-llm-and-judge-model-can-be
Open

SDK-89: Validate test LLM and judge LLM can be accessed from HuggingFace#214
benglewis wants to merge 12 commits into
mainfrom
codex/2026-02-11/linear-mention-sdk-89-validate-llm-and-judge-model-can-be

Conversation

@benglewis
Copy link
Copy Markdown
Contributor

@benglewis benglewis commented Feb 11, 2026

User description

Codex generated this pull request, but encountered an unexpected error after generation. This is a placeholder PR message.


Codex Task


Note

Medium Risk
Adds new outbound HuggingFace API calls on critical workflows (LlmModel.create, eval run launch), which can introduce latency or new failure modes if HF is unavailable or tokens are misconfigured.

Overview
Adds pre-flight HuggingFace access validation for both LLM creation and LLM behavior eval runs, surfacing clearer HirundoError messages for gated/private/missing/unauthorized models and skipping validation when the judge model is a local path.

Extends model source outputs to carry an optional HuggingFace token, introduces a new _model_access.py helper built on huggingface_hub, updates several Pydantic models’ model_config to protect model_validate/model_dump, and adds unit tests plus the new huggingface-hub dependency.

Written by Cursor Bugbot for commit d236474. This will update automatically on new commits. Configure here.


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Validate HuggingFace-hosted LLMs and judge models before use by reusing the new _model_access helper during LlmModel.create and LlmBehaviorEval.launch_eval_run, surfacing clearer HirundoError messages when gated, private, or unauthorized models are encountered. Update environment helpers so feature-gated tests rely on get_env_bool and centralize boolean flags while adding the huggingface-hub dependency for the new API calls.

TopicDetails
Env Flags & Tests Leverage get_env_bool for the shared QA/eval tests and document pytest-only guidance so long-running flows gate on consistent boolean flags instead of raw os.getenv calls.
Modified files (5)
  • AGENTS.md
  • hirundo/_env.py
  • tests/dataset_qa_shared.py
  • tests/llm-behavior-eval/llm_behavior_eval_test.py
  • tests/unlearning-llm/unlearn_llm_behavior_test.py
Latest Contributors(1)
UserCommitDate
blewis@hirundo.ioSDK-87: Migrate to `uv...February 11, 2026
HF Access Validation Validate HuggingFace model access for LLM creation and behavior eval workflows by wiring validate_huggingface_model_access/validate_judge_model_access into LlmModel, LlmBehaviorEval, and their supporting Pydantic configs, plus covering the new logic with targeted unit tests and the huggingface-hub dependency.
Modified files (10)
  • hirundo/_llm_sources.py
  • hirundo/_model_access.py
  • hirundo/llm_behavior_eval.py
  • hirundo/llm_behavior_eval_results.py
  • hirundo/unlearning_llm.py
  • pyproject.toml
  • tests/test_llm_behavior_eval_model_access.py
  • tests/test_model_access.py
  • tests/test_unlearning_llm_model_create.py
  • uv.lock
Latest Contributors(1)
UserCommitDate
blewis@hirundo.ioSDK-79: Add LLM behavi...February 04, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

@benglewis benglewis changed the title Codex-generated pull request SDK-89: Validate test LLM and judge LLM can be accessed from HuggingFace Feb 11, 2026
Comment thread hirundo/llm_behavior_eval.py
@benglewis benglewis self-assigned this Feb 12, 2026
…ate-llm-and-judge-model-can-be

# Conflicts:
#	pyproject.toml
#	uv.lock
Comment thread hirundo/_model_access.py Outdated
Hardcoded token=None ignores model's stored token
&
Unreachable hint="private" message branch is dead code
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread hirundo/_model_access.py
Comment thread tests/test_model_access.py Outdated
Comment thread tests/test_model_access.py Outdated
@baz-reviewer
Copy link
Copy Markdown

baz-reviewer Bot commented Mar 12, 2026

Spec Reviewer Report    📪 ✅

Checkout in Baz

All 2 Identified Requirements Met for Ticket:

Validate LLM and judge model can be accessed publicly / using token provided


2 met requirements
# Requirement Explanation
1 Validate HF access before LLM creation/eval LLM creation and behavior eval launches now invoke helpers that call HuggingFace model_info and raise HirundoError when access cannot be confirmed, preventing the operation.
evidence
  • hirundo/unlearning_llm.py:53-63 checks HuggingFace access before create
  • hirundo/llm_behavior_eval.py:157-171 validates judge/LLM HF models before run
  • hirundo/_model_access.py:64-124 model_info call raises HirundoError when inaccessible
2 Explain why HuggingFace token is required in access errors The new HuggingFace access validator raises HirundoError messages that use gated/not-found/unauthorized hints and fall back to a generic guidance string, ensuring users understand when a token or different ID is needed.
evidence
  • hirundo/_model_access.py:15-55 – gated/private/unauthorized hint builder
  • hirundo/_model_access.py:64-113 – validator maps HF errors to those hints with token awareness
  • hirundo/llm_behavior_eval.py:157-249 – run launch validates judge and LLM access before API call
  • hirundo/unlearning_llm.py:53-75 – LLM creation now checks HuggingFace access before posting
  • tests/test_model_access.py:32-73 – unit tests assert gated/private/unauthorized messaging

Note: Some optional integrations are missing, so it might not be possible to check some of the requirements.
For best results, make sure the following are integrated: Figma



Used resources:
Hash: dc4df91 | Ticket: link

To rerun the Spec Reviewer, comment "baz rerun spec review".

Comment thread tests/test_model_access.py Outdated
@benglewis benglewis marked this pull request as ready for review March 13, 2026 16:33
@benglewis benglewis requested review from a team as code owners March 13, 2026 16:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce40b0a1ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hirundo/_model_access.py Outdated
Comment thread hirundo/unlearning_llm.py Outdated
Comment thread hirundo/llm_behavior_eval.py
orr-hirundo
orr-hirundo previously approved these changes Mar 17, 2026
Copy link
Copy Markdown

@orr-hirundo orr-hirundo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread hirundo/_env.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants