MAINT/FIX: Generalizing eval-hash#1434
Conversation
There was a problem hiding this comment.
Pull request overview
This PR generalizes “eval-hash” computation by moving it from scorer-specific IO utilities into the identifiers layer, enabling reuse across other component types (e.g., attacks) and fixing evaluator lookups to consistently use eval_hash rather than the full component hash.
Changes:
- Added
ComponentIdentifier.compute_eval_hash()plusIdentifiable.get_eval_hash()and ClassVar-based configuration for defining eval-hash behavior. - Updated scorer evaluation registry read/write and lookups to use
eval_hash(and updated call sites accordingly). - Refactored and relocated eval-hash unit tests from scorer metrics tests into identifier tests; updated evaluator tests for the new lookup key and function names.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
pyrit/identifiers/component_identifier.py |
Adds eval-hash computation, dynamic eval-config resolution, and Identifiable.get_eval_hash() helper. |
pyrit/score/scorer.py |
Defines scorer eval-hash ClassVars (EVAL_TARGET_CHILD_KEYS, EVAL_BEHAVIORAL_CHILD_PARAMS). |
pyrit/score/scorer_evaluation/scorer_metrics_io.py |
Switches registry lookup and replace semantics to key on eval_hash; requires callers to provide eval_hash. |
pyrit/score/scorer_evaluation/scorer_evaluator.py |
Uses get_eval_hash() for skip-logic and writes metrics keyed by eval hash. |
pyrit/score/true_false/true_false_scorer.py |
Updates objective metrics lookup to use eval hash. |
pyrit/score/float_scale/float_scale_scorer.py |
Updates harm metrics lookup to use eval hash. |
pyrit/score/printer/console_scorer_printer.py |
Updates printer registry lookups to use eval hash. |
pyrit/models/scenario_result.py |
Updates scenario result metrics lookup to use eval hash. |
tests/unit/score/test_scorer_metrics.py |
Updates registry entry assertions to use eval_hash and removes eval-hash implementation tests from this file. |
tests/unit/score/test_scorer_evaluator.py |
Updates mocks/patches to use get_eval_hash() and find_*_by_eval_hash functions. |
tests/unit/identifiers/test_component_identifier.py |
Adds comprehensive tests for eval-hash dict construction, dynamic config resolution, and get_eval_hash(). |
Comments suppressed due to low confidence (2)
pyrit/identifiers/component_identifier.py:284
- When only one of
target_child_keys/behavioral_child_paramsis provided and_resolve_eval_config()fails, the other value becomes an emptyfrozenset(). Iftarget_child_keysis non-empty butbehavioral_child_paramsresolves to empty, target children will have all params stripped from the eval hash, which can incorrectly collapse distinct model configs into the sameeval_hash. Consider validating that these two args are provided together (or that resolution succeeded) and falling back toself.hash/ raising aValueErrorwhen the eval config is incomplete.
if target_child_keys is None or behavioral_child_params is None:
resolved_keys, resolved_params = self._resolve_eval_config()
if target_child_keys is None:
target_child_keys = resolved_keys
if behavioral_child_params is None:
behavioral_child_params = resolved_params
if not target_child_keys:
return self.hash
pyrit/score/scorer_evaluation/scorer_evaluator.py:276
scorer_hashnow holds the scorer’s eval hash (get_eval_hash()), not the full componenthash. Renaming this local (e.g.,eval_hash) would reduce confusion and make it clearer which key is used for registry lookups.
scorer_hash = self.scorer.get_eval_hash()
bashirpartovi
left a comment
There was a problem hiding this comment.
Great catch on fixing those call sites to use eval_hash instead of hash, sorry I missed that in my previous PR.
I do want to push back a bit on moving the eval-hash logic onto ComponentIdentifier, though. It kind of undoes the decoupling we just did in the recent refactor.
Just to recap why we changed this in the first place, the old design had a separate identifier class for every component (ScorerIdentifier, TargetIdentifier, etc.). Each one had its own custom fields and escape hatches for weird params. When a scorer captured a target, it reached in and cherry-picked specific fields like model_name and temperature. Converters did the same thing. Basically, components had to know way too much about each other. If someone added frequency_penalty to a target, it wouldn't even show up in the scorer's identity unless someone remembered to update Scorer._create_identifier().
The refactor fixed this by using a single ComponentIdentifier that just holds opaque snapshots. The parent doesn't peek inside anymore. The target's full identity flows in as a child, the hash includes it automatically, and the DB gets the whole unfiltered picture for debugging.
Because the old way was cherry picking fields, it generated lossy data and mashed object identity and eval grouping together. We explicitly broke those apart, ComponentIdentifier handles strict identity, and compute_eval_hash (in scorer_metrics_io.py) handles the behavioral grouping with an allowlist. The identity layer just hashes things, and the eval layer decides what to filter. This also set us up so when attacks need eval hashing later, they just write their own wrappers.
By moving compute_eval_hash to ComponentIdentifier and adding those EVAL_* vars to Identifiable, we're leaking eval logic back into the identity layer. Now every Identifiable component carries eval-specific ClassVars even though only Scorer uses them right now.
The part I'm most worried about is _resolve_eval_config:
def _resolve_eval_config(self) -> tuple[frozenset[str], frozenset[str]]:
module = importlib.import_module(self.class_module)
cls = getattr(module, self.class_name, None)ComponentIdentifier is supposed to be a pure, dumb data snapshot. from_dict() should be able to rebuild it safely from the DB even if the original class doesn't exist anymore. This method breaks that safety. If a class gets renamed or moved, importlib fails silently, returns empty sets, and compute_eval_hash() falls back to self.hash with no filtering. We won't get any errors, just two scorers that should share results silently failing to do so, wasting compute on expensive evals.
It also gives us two different code paths to calculate the same hash, a live instance uses scorer.get_eval_hash() (reading ClassVars directly), while a deserialized one uses identifier.compute_eval_hash() (using importlib). Before this PR, we had one function, one path, explicit constants, and everyone used the same thing.
Recommendation
I'm 100% on board with making this code reusable. We'll probably need eval hashing for other things. We just don't need to complicate the identity type to get there.
What if we move _build_eval_dict and a generic compute_eval_hash into component_identifier.py as module-level functions with explicit parameters? Like this:
def compute_eval_hash(
identifier: ComponentIdentifier,
*,
target_child_keys: frozenset[str],
behavioral_child_params: frozenset[str],
) -> str:
eval_dict = _build_eval_dict(
identifier,
target_child_keys=target_child_keys,
behavioral_child_params=behavioral_child_params,
)
return config_hash(eval_dict)Then each eval domain just wraps it with its own constants:
# scorer_metrics_io.py
_TARGET_CHILD_KEYS = frozenset({"prompt_target", "converter_target"})
_BEHAVIORAL_CHILD_PARAMS = frozenset({"model_name", "temperature", "top_p"})
def compute_scorer_eval_hash(identifier: ComponentIdentifier) -> str:
return compute_eval_hash(
identifier,
target_child_keys=_TARGET_CHILD_KEYS,
behavioral_child_params=_BEHAVIORAL_CHILD_PARAMS,
)
# attack_metrics_io.py (for future usecase)
def compute_attack_eval_hash(identifier: ComponentIdentifier) -> str:
return compute_eval_hash(
identifier,
target_child_keys=frozenset({"objective_target"}),
behavioral_child_params=frozenset({"model_name", "temperature"}),
)This keeps the generic hashing in the identity module, but leaves the domain-specific config in the domain modules. ComponentIdentifier stays a pure data type (no importlib, no silent failures, no extra ClassVars). Every domain gets a clean 3-line wrapper, and we keep the decoupling intact while still getting all the reuse.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (6)
pyrit/score/scorer.py:86
- get_eval_hash() introduces a new local import of ScorerEvaluationIdentity. This module doesn’t appear to import Scorer back, so the deferred import may be unnecessary; keeping it at module scope (or under TYPE_CHECKING + a normal import) would reduce per-call overhead and make imports consistent/predictable.
# Deferred import to avoid circular dependency (scorer_evaluation_identity → identifiers → …)
from pyrit.score.scorer_evaluation.scorer_evaluation_identity import ScorerEvaluationIdentity
return ScorerEvaluationIdentity(self.get_identifier()).eval_hash
pyrit/score/printer/console_scorer_printer.py:106
- print_objective_scorer() adds a function-local import of ScorerEvaluationIdentity. This dependency doesn’t look circular from this module, so it can likely be imported at the top (or under TYPE_CHECKING) to follow the project’s usual import organization and avoid repeated imports on each call.
from pyrit.score.scorer_evaluation.scorer_evaluation_identity import ScorerEvaluationIdentity
from pyrit.score.scorer_evaluation.scorer_metrics_io import (
find_objective_metrics_by_eval_hash,
)
pyrit/score/printer/console_scorer_printer.py:134
- print_harm_scorer() adds a function-local import of ScorerEvaluationIdentity. If there’s no circular import here, prefer moving it to the module import section (or TYPE_CHECKING + normal import) for consistency and to avoid repeated imports.
from pyrit.score.scorer_evaluation.scorer_evaluation_identity import ScorerEvaluationIdentity
from pyrit.score.scorer_evaluation.scorer_metrics_io import (
find_harm_metrics_by_eval_hash,
)
pyrit/score/scorer_evaluation/scorer_evaluator.py:279
- scorer_hash now holds the scorer’s eval hash (via get_eval_hash()), but the variable name still suggests it’s the raw ComponentIdentifier.hash. Renaming it (e.g., scorer_eval_hash) would reduce ambiguity and make the subsequent lookup calls clearer.
scorer_hash = self.scorer.get_eval_hash()
# Determine if this is a harm or objective evaluation
metrics_type = MetricsType.OBJECTIVE if isinstance(self.scorer, TrueFalseScorer) else MetricsType.HARM
pyrit/identifiers/evaluation_identity.py:163
- EvaluationIdentity.init will raise a generic AttributeError if a subclass forgets to define TARGET_CHILD_KEYS/BEHAVIORAL_CHILD_PARAMS. Consider adding an explicit validation in init (e.g., via getattr checks) and raising a clearer exception that explains which ClassVars are missing and how to fix it.
def __init__(self, identifier: ComponentIdentifier) -> None:
self._identifier = identifier
self._eval_hash = compute_eval_hash(
identifier,
target_child_keys=self.TARGET_CHILD_KEYS,
behavioral_child_params=self.BEHAVIORAL_CHILD_PARAMS,
)
pyrit/score/scorer.py:76
- The PR description mentions adding a generic get_eval_hash() convenience method on Identifiable so non-scorer components can reuse eval-hash logic. In this diff, get_eval_hash() is only added to Scorer; consider either adding the Identifiable-level helper or updating the PR description to match the current scope.
def get_eval_hash(self) -> str:
"""
Compute a behavioral equivalence hash for evaluation grouping.
Delegates to ``ScorerEvaluationIdentity`` which filters target children
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (5)
pyrit/score/printer/console_scorer_printer.py:106
- These methods add additional local imports (
ScorerEvaluationIdentity,find_*_by_eval_hash) even though the imported modules don’t appear to depend onconsole_scorer_printer(so a circular import is unlikely). To keep import structure consistent and easier to maintain, prefer moving these imports to the top of the file (or add a brief comment if a cycle truly requires local imports).
from pyrit.score.scorer_evaluation.scorer_evaluation_identity import ScorerEvaluationIdentity
from pyrit.score.scorer_evaluation.scorer_metrics_io import (
find_objective_metrics_by_eval_hash,
)
tests/unit/score/test_scorer_evaluation_identity.py:9
- The module docstring says this file covers the
Scorer.get_eval_hash()convenience method, but the tests below compute hashes viaScorerEvaluationIdentity(...)directly and never callScorer.get_eval_hash(). Either update the docstring to match the actual coverage, or add a minimal concreteScorersubclass test that exercisesget_eval_hash()end-to-end.
"""
Tests for pyrit.score.scorer_evaluation.scorer_evaluation_identity.
Covers ``ScorerEvaluationIdentity`` ClassVar values, eval-hash delegation, and
the ``Scorer.get_eval_hash()`` convenience method.
"""
pyrit/score/scorer_evaluation/scorer_evaluator.py:289
scorer_hashnow storesself.scorer.get_eval_hash()(an evaluation hash), so the variable name is misleading. Renaming it (e.g.,scorer_eval_hash) would make it clearer what is being passed to the*_by_eval_hashlookup helpers.
scorer_hash = self.scorer.get_eval_hash()
# Determine if this is a harm or objective evaluation
metrics_type = MetricsType.OBJECTIVE if isinstance(self.scorer, TrueFalseScorer) else MetricsType.HARM
existing: Optional[ScorerMetrics] = None
if metrics_type == MetricsType.HARM:
if harm_category is None:
logger.warning("harm_category must be provided for harm scorer evaluations")
return (False, None)
existing = find_harm_metrics_by_eval_hash(
eval_hash=scorer_hash,
pyrit/score/scorer_evaluation/scorer_metrics_io.py:176
- The
find_objective_metrics_by_hash/find_harm_metrics_by_hashAPIs were renamed/removed in favor of*_by_eval_hash. If this module is part of the public API surface, this is a breaking change for downstream callers. Consider keeping the old function names as deprecated wrappers (calling the new functions and emitting a deprecation warning) for at least one release cycle.
def find_objective_metrics_by_eval_hash(
*,
eval_hash: str,
file_path: Optional[Path] = None,
) -> Optional[ObjectiveScorerMetrics]:
"""
Find objective scorer metrics by evaluation hash.
Args:
eval_hash (str): The scorer evaluation hash to search for.
file_path (Optional[Path]): Path to the JSONL file to search.
If not provided, uses the default path:
SCORER_EVALS_PATH / "objective" / "objective_achieved_metrics.jsonl"
Returns:
ObjectiveScorerMetrics if found, else None.
"""
if file_path is None:
file_path = SCORER_EVALS_PATH / "objective" / "objective_achieved_metrics.jsonl"
return _find_metrics_by_eval_hash(file_path=file_path, eval_hash=eval_hash, metrics_class=ObjectiveScorerMetrics)
def find_harm_metrics_by_eval_hash(
*,
eval_hash: str,
pyrit/score/scorer.py:87
get_eval_hash()uses a local (deferred) import, butScorerEvaluationIdentityonly depends onpyrit.identifiersand does not importpyrit.score.scorer(so a circular import doesn’t appear to exist). If there’s no real cycle, prefer moving this import to the module top to keep imports centralized; if there is a cycle, it should be documented with the specific modules involved.
# Deferred import to avoid circular dependency (scorer_evaluation_identity → identifiers → …)
from pyrit.score.scorer_evaluation.scorer_evaluation_identity import ScorerEvaluationIdentity
return ScorerEvaluationIdentity(self.get_identifier()).eval_hash
Problem
Prior to this PR,
eval-hashcomputation (_build_eval_dict,compute_eval_hash) lived inscorer_metrics_io.pyand was tightly coupled toScorer. Every call site had to import scorer-io specific helpers and manually pass params. This made the code hard to reuse for non-scorer components. Pretty soon, we will want to do evals on attacks, and potentially other pieces.Additionally, there were some instances in
ScorerEvaluatorthat were using the hash instead ofeval_hash. This PR fixes that.Also fixing unrelated flaky test that didn't patch central memory and failed in build.
Solution
This PR moves eval-hash computation into the identity layer.
ComponentIdentifiergains acompute_eval_hash()method that dynamically looks up target keys and child params.Identifiableadds those class vars and aget_eval_hash()convenience method, so any component type can opt into eval-hash filtering by overriding the ClassVars.All the existing behavior (other than the bug fix) should stay the same.