fix(evaluation): replace bare raises with proper exceptions and add text_generation_quality request by zamal-db · Pull Request #560 · PrunaAI/pruna

zamal-db · 2026-02-28T23:31:01Z

Description

I was running SharpnessMetric on outputs from a custom diffusion pipeline and got hit with RuntimeError: No active exception to reraise. Took me a while to figure out what was going on since the actual issue was a tensor shape mismatch, but the bare raise on this line was masking the real error. The descriptive message was being logged via pruna_logger.error() right above it, but the raise itself had no exception attached.

Checked the rest of the evaluation metrics and found 3 more of the same pattern across 2 other files. Fixed all 4, also caught a small typo in registry.py while I was in there.

Separately, I noticed Task only supports image_generation_quality as a named request. Added text_generation_quality since perplexity is already available through TorchMetricWrapper and it felt like a natural complement.

Changes

Bug fixes (4 bare raises):

metric_sharpness.py: 2x bare raise replaced with raise ValueError(...) for wrong tensor dimensions and unsupported channel count
metric_dino_score.py: bare raise replaced with raise ValueError(...) for unsupported device
metric_memory.py: bare raise replaced with raise RuntimeError(...) for multi-GPU without device map

Typo fix:

registry.py: "dos not inherit" fixed to "does not inherit"

Feature:

Add text_generation_quality named evaluation request to Task, returning a perplexity metric to complement the existing image_generation_quality request

Related Issue

N/A

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Verified all 4 bare-raise paths produce RuntimeError: No active exception to reraise before the fix, and raise the correct typed exception with descriptive message after
Added test_task_text_generation_quality_request to verify the named request returns a perplexity TorchMetricWrapper
Added test_task_invalid_named_request to verify unknown requests raise ValueError
Ran ruff check on all modified source files, all pass

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

None

davidberenstein1957

Thanks for the PR. Some small changes required :)

davidberenstein1957 · 2026-03-05T16:12:01Z

            pruna_logger.error(f"DinoScore: device {device} not supported. Supported devices: {self.runs_on}")
-            raise
+            raise ValueError(f"DinoScore: device {device} not supported. Supported devices: {self.runs_on}")


can we define the text once and then raise and log the same message?

davidberenstein1957 · 2026-03-05T16:12:04Z

        if images.ndim != 4:
            pruna_logger.error(f"Expected 4‑D tensor (B, C, H, W); got shape {tuple(images.shape)}")
-            raise
+            raise ValueError(f"Expected 4-D tensor (B, C, H, W); got shape {tuple(images.shape)}")


can we define the text once and then raise and log the same message?

davidberenstein1957 · 2026-03-05T16:12:09Z

            else:
                pruna_logger.error("SharpnessMetric: unsupported channel count")
-                raise
+                raise ValueError(f"SharpnessMetric: unsupported channel count {img.shape[0]}. Expected 1 or 3 channels.")


can we define the text once and then raise and log the same message?

zamal-db · 2026-03-08T16:15:57Z

Addressed in 4c63091 - extracted all duplicated error messages into a msg variable so the same text is used for both pruna_logger.error(msg) and
aise ...Error(msg). Applied across all 5 spots in the PR (dino_score, sharpness x2, memory, task).

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-08T16:19:11Z

+@pytest.mark.cpu
+def test_task_text_generation_quality_request():
+    """Test that 'text_generation_quality' named request creates perplexity metric."""
+    task = Task(request="text_generation_quality", datamodule=PrunaDataModule.from_string("TinyWikiText"), device="cpu")


Test will always fail due to missing tokenizer

High Severity

PrunaDataModule.from_string("TinyWikiText") will raise TokenizerMissingError because the text_generation_collate function requires a tokenizer argument that isn't provided. The existing data test for TinyWikiText in tests/data/test_datamodule.py confirms this by always passing tokenizer=bert_tokenizer. This test will never reach the assertions — it will fail during datamodule construction.

begumcig

Looks good to me, thanks alot! We can merge once the tests pass :)

zamal-db · 2026-03-10T07:14:28Z

@begumcig Thanks for the approval! The CI failure is unrelated to this PR it's test_aesthetic_laion[cpu-openai/clip-vit-large-patch14] hitting HTTP 429 (Too Many Requests) when downloading the LAION aesthetic predictor weights. It's been failing on both run attempts (retried 3 each). None of our test files (test_task.py) are involved the run stops at 41% before reaching them.

I've pushed an empty commit to re-trigger CI. Hopefully the rate limit clears this time.

begumcig · 2026-03-11T15:38:08Z

hi @zamal-db can you rebase your branch or main one more time? We handled the error that was causing the problem and it should be fine now. Thank you so much for your patience 🥹

…ext_generation_quality request - Fix 4 bare `raise` statements that crash with `RuntimeError: No active exception to re-raise` outside except blocks: - metric_dino_score.py: raise -> raise ValueError - metric_memory.py: raise -> raise RuntimeError - metric_sharpness.py: 2x raise -> raise ValueError - Fix typo in registry.py: 'dos not' -> 'does not' - Add 'text_generation_quality' named evaluation request to Task (returns perplexity metric) - Add tests for new named request and invalid request error handling

…ency

…on_quality test

zamal-db · 2026-03-12T08:49:17Z

Done, rebased onto latest main. CI should be clean now. Thanks for fixing the flaky test!

begumcig · 2026-03-12T13:31:45Z

Perfect! Thanks a lot once again @zamal-db

zamal-db mentioned this pull request Mar 1, 2026

feat: add column_map support to collate functions #561

Merged

10 tasks

davidberenstein1957 requested a review from begumcig March 5, 2026 16:11

davidberenstein1957 approved these changes Mar 5, 2026

View reviewed changes

cursor Bot reviewed Mar 8, 2026

View reviewed changes

begumcig approved these changes Mar 9, 2026

View reviewed changes

begumcig mentioned this pull request Mar 11, 2026

ci: fix too many requests http error in the cpu tests #577

Merged

10 tasks

zamal-db added 3 commits March 12, 2026 09:46

refactor: extract error messages into variables for log+raise consist…

3077416

…ency

fix(test): pass tokenizer to TinyWikiText datamodule in text_generati…

d4a435c

…on_quality test

zamal-db force-pushed the fix/evaluation-bare-raise-and-text-gen-quality branch from 9d70884 to c5fd0cb Compare March 12, 2026 08:48

zamal-db force-pushed the fix/evaluation-bare-raise-and-text-gen-quality branch from c5fd0cb to d4a435c Compare March 12, 2026 08:51

begumcig merged commit 637eaff into PrunaAI:main Mar 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evaluation): replace bare raises with proper exceptions and add text_generation_quality request#560

fix(evaluation): replace bare raises with proper exceptions and add text_generation_quality request#560
begumcig merged 3 commits intoPrunaAI:mainfrom
zamal-db:fix/evaluation-bare-raise-and-text-gen-quality

zamal-db commented Feb 28, 2026 •

edited

Loading

Uh oh!

davidberenstein1957 left a comment

Uh oh!

davidberenstein1957 Mar 5, 2026

Uh oh!

davidberenstein1957 Mar 5, 2026

Uh oh!

davidberenstein1957 Mar 5, 2026

Uh oh!

zamal-db commented Mar 8, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 8, 2026

Uh oh!

begumcig left a comment

Uh oh!

zamal-db commented Mar 10, 2026

Uh oh!

begumcig commented Mar 11, 2026

Uh oh!

zamal-db commented Mar 12, 2026

Uh oh!

begumcig commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zamal-db commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

davidberenstein1957 left a comment

Choose a reason for hiding this comment

Uh oh!

davidberenstein1957 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

davidberenstein1957 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

davidberenstein1957 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

zamal-db commented Mar 8, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 8, 2026

Choose a reason for hiding this comment

Test will always fail due to missing tokenizer

Uh oh!

begumcig left a comment

Choose a reason for hiding this comment

Uh oh!

zamal-db commented Mar 10, 2026

Uh oh!

begumcig commented Mar 11, 2026

Uh oh!

zamal-db commented Mar 12, 2026

Uh oh!

begumcig commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zamal-db commented Feb 28, 2026 •

edited

Loading