Skip to content

RFAI-05: Confidence pipeline and Review evidence section #977

@Chris0Jeky

Description

@Chris0Jeky

Context

Week 5 from taskdeck-12-week-roadmap-v4.md.

Parent: #972
Depends on: #976

This issue makes proposal confidence and evidence visible enough for review-first automation to feel trustworthy, not opaque.

Scope

  • Combine verbalized confidence, provider confidence/logprob signals where available, and attribution/provenance verification scores into field-level confidence.
  • Add Review UI evidence sections using source spans and EvidenceLink reason chips.
  • Add self-consistency only for high-criticality low-confidence decisions, with explicit quotas/cost guardrails.
  • Add Brier-score measurement on confidence buckets for the golden set.
  • Preserve existing readable proposal cards, sticky actions, and board-centered review flow; this is an evidence depth pass, not another Review decomposition.

Acceptance Criteria

  • Review cards expose why a proposal exists, with source/evidence links where available.
  • Confidence buckets are generated deterministically and can be scored against the golden set.
  • High-criticality low-confidence self-consistency is gated by quotas and never silently applies work.
  • Manual inbox-triage review evidence path is documented and can be tested.
  • Brier score target and current result are reported by the eval harness.

Suggested Verification

  • Frontend Review component tests
  • Backend confidence aggregation tests
  • Golden eval smoke including confidence-bucket scoring
  • Manual Review flow check recorded in docs when behavior lands

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Pending

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions