`/hunt` jurisdiction system: configurable LLM-as-judge with strict/evidentiary/permissive policies

**Full jurisdiction system: configurable judge LLM, three policies, trajectory-aware verdicts.**

After #2092 (vocabulary, v0.8.45) and #2093 (verifier-preview wiring, v0.8.46), this issue lands the full Codex-style LLM-as-judge with codewhale-native jurisdictions. Absorbs and supersedes the original "port Codex goal system" framing.

## Concept

A **jurisdiction** is the policy a judge applies to decide whether a quarry is statutorily hunted. Three built-in jurisdictions:

| Jurisdiction | What counts as hunted |
|---|---|
| `strict` | Diff exists, tests added or updated, CI green (or local verifier equivalent). |
| `evidentiary` | Diff cites files; agent shows changes; no contradiction with quarry. |
| `permissive` | Agent declares done; judge sanity-checks. |

Each turn ends with a judge call. Judge sees: quarry, current trajectory summary, this turn's evidence, full diff against `origin/main` (or session start). Judge returns:

```rust
struct JudgeRuling {
    verdict: HuntVerdict,
    reasoning: String,
    next_step: NextStep,  // continue | handoff | abandon | declare_hunted
}
```

## In scope (v0.9.0)

- `config.toml` `[hunt]` section: `jurisdiction = "evidentiary"` (default), `judge_model = "auto"` (defaults to the session model with a judge-only system prompt), `judge_max_tokens = 4096`, `judge_temperature = 0.0`.
- Judge runs at each turn boundary in hunt mode. Verdict logged inline in transcript between turns.
- `next_step` actions:
  - `continue` — model gets a system-message hint of judge reasoning, continues.
  - `handoff` — runtime suggests a sub-agent and lists the species + brief.
  - `abandon` — session marked `escaped`, no trophy.
  - `declare_hunted` — trophy written, session may auto-close per config.
- `/hunt jurisdiction <strict|evidentiary|permissive>` switches mid-hunt.
- `/why hunted` / `/why wounded` / `/why escaped` shows the judge's last reasoning.
- Judge prompt template under `crates/tui/src/prompts/judge.txt` — small, focused, source-controlled.

## Out of scope

- Trained verifier models (out of scope forever for v0.9.0; remains LLM-as-judge).
- Per-statute custom jurisdictions in config (RFC; lands later if asked for).
- Judge-as-sub-agent species (the judge isn't a whale — it's the court).

## Acceptance

- Three jurisdictions selectable and observably different in behavior on the same test quarry.
- Judge prompt is auditable and source-controlled.
- Verdict + reasoning render inline in transcript.
- `/why <verdict>` returns the most recent judge reasoning.
- Trophy card includes the jurisdiction the hunt was decided under.
- Eval harness has at least one regression test per jurisdiction asserting the verdict shape.

## Closes / partially closes

- Closes #2058 (Port Codex goal system) — this is the codewhale-native realization, not a port.
- Builds on #2092 (vocabulary) and #2093 (verifier wiring).

## Notes

- The judge isn't a sub-agent species. Whales hunt; the judge is the court. Treat as a distinct primitive.
- LLM-as-judge has known failure modes (sycophancy toward the agent, brittleness on edge cases). The three-jurisdiction split is partial mitigation: `strict` is mechanical (CI gates aren't LLM-decided), `permissive` is honest about its laxness, `evidentiary` is where the judging actually happens.

_Replaces #2058. Final piece of the hunt trilogy: #2092 (vocabulary, v0.8.45) → #2093 (verifier wiring, v0.8.46) → this (v0.9.0)._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/hunt` jurisdiction system: configurable LLM-as-judge with strict/evidentiary/permissive policies #2094

Concept

In scope (v0.9.0)

Out of scope

Acceptance

Closes / partially closes

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Jurisdiction	What counts as hunted
`strict`	Diff exists, tests added or updated, CI green (or local verifier equivalent).
`evidentiary`	Diff cites files; agent shows changes; no contradiction with quarry.
`permissive`	Agent declares done; judge sanity-checks.

/hunt jurisdiction system: configurable LLM-as-judge with strict/evidentiary/permissive policies #2094

Description

Concept

In scope (v0.9.0)

Out of scope

Acceptance

Closes / partially closes

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`/hunt` jurisdiction system: configurable LLM-as-judge with strict/evidentiary/permissive policies #2094