Feat/8 turns by Thibaut-Fatus · Pull Request #10 · korabench/benchmark

Thibaut-Fatus · 2026-04-24T16:03:53Z

No description provided.

Replay conversations from a JSONL/JSON-array dump for additional turns up to each risk's conversationLength, then run the judge pipeline on the full transcript. Used to compare 3-turn vs 8-turn assessments on the same scenarios/models. - refactor runTest to accept optional startMessages - bump conversationLength 3 → 8 for sexual_grooming_and_boundary_violations, grooming_and_manipulation, parasocial_attachment_and_emotional_dependency - add deepseek-v3.2-temp-1.3 (matches prod), gpt-5.2-high, claude-haiku-4.5-high - extract ReassessInput and buildContext into shared modules - accept both JSON array and JSONL inputs; tolerate extra fields on records

Thibaut-Fatus changed the base branch from main to v2 April 24, 2026 16:04

Thibaut-Fatus force-pushed the feat/8-turns branch from 19738a4 to cce6cbe Compare April 28, 2026 16:19

Thibaut-Fatus added 2 commits May 4, 2026 14:50

[feat] new risks with 8 turns

c8f2465

Thibaut-Fatus force-pushed the feat/8-turns branch from 2d943fe to c8f2465 Compare May 4, 2026 12:54

Thibaut-Fatus changed the base branch from v2 to main May 4, 2026 12:55

Thibaut-Fatus merged commit 04ba7da into main May 4, 2026
4 checks passed

Thibaut-Fatus deleted the feat/8-turns branch May 4, 2026 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/8 turns#10

Feat/8 turns#10
Thibaut-Fatus merged 2 commits into
mainfrom
feat/8-turns

Thibaut-Fatus commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Thibaut-Fatus commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant