Test Warnings Report - 2026-01-26
Date: 2026-01-26
Tests with warnings: 11
Generated: 2026-01-27 09:09:05
Note: For detailed error messages and stack traces, check the log files in scripts/logs/ with date prefix 20260126.
Add GitHub issue links in the Ticket column for tracking.
| Test ID |
Benchmark |
Provider |
Model |
Severity |
Warning Codes |
Ticket |
Action |
Completed |
| T0023 |
business_letters |
mistral |
pixtral-large-2411 |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-03-04 |
| T0109 |
business_letters |
openai |
gpt-5 |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-02-17 |
| T0113 |
business_letters |
openai |
gpt-5-mini |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-02-17 |
| T0166 |
library_cards |
openai |
gpt-5-mini |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-02-17 |
| T0244 |
business_letters |
openrouter |
qwen/qwen3-vl-8b-thinking |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-04-03 |
| T0245 |
business_letters |
openrouter |
qwen/qwen3-vl-8b-thinking |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-04-03 |
| T0276 |
medieval_manuscripts |
openai |
gpt-4o-mini |
🔴 Critical |
ZERO_COST, ALL_NA |
|
add graceful error handling to llm client |
test rerun on 2026-02-17 |
| T0278 |
medieval_manuscripts |
openai |
gpt-4.1-nano |
🔴 Critical |
ZERO_COST, ALL_NA |
|
rerun & delete old |
2026-02-17 |
| T0421 |
medieval_manuscripts |
genai |
gemini-3-pro-preview |
🔴 Critical |
ZERO_COST, ALL_NA |
|
delete (model deprecated) |
2026-03-04 |
| T0555 |
business_letters |
mistral |
ministral-8b-2512 |
🔴 Critical |
ZERO_COST, ALL_NA |
RISE-UNIBAS/generic_llm_api_client#3 |
rerun & delete old |
|
| T0560 |
medieval_manuscripts |
mistral |
ministral-8b-2512 |
🔴 Critical |
ZERO_COST, ALL_NA |
RISE-UNIBAS/generic_llm_api_client#3 |
rerun & delete old |
|
Warning Codes Explanation
| Code |
Severity |
Description |
| ZERO_COST |
🟠 High |
Total cost is $0 (pricing issue) |
| ALL_NA |
🔴 Critical |
All metrics are N/A (scoring failed) |
| ZERO_SCORE |
🟡 Medium |
Score is 0 (exceptionally bad performance) |
| ZERO_ITEMS |
🔴 Critical |
No items processed |
| ZERO_DURATION |
🟠 High |
No timing captured |
Action Required
Test Warnings Report - 2026-01-26
Date: 2026-01-26
Tests with warnings: 11
Generated: 2026-01-27 09:09:05
Note: For detailed error messages and stack traces, check the log files in
scripts/logs/with date prefix20260126.Add GitHub issue links in the Ticket column for tracking.
Warning Codes Explanation
Action Required