Skip to content

RFAI-08: Eval harness expansion, privacy analytics, and egress disclosure #980

@Chris0Jeky

Description

@Chris0Jeky

Context

Week 8 from taskdeck-12-week-roadmap-v4.md.

Parent: #972
Depends on: #976, #979

This issue expands evaluation and privacy controls around the new proposal pipeline. It treats exfiltration safety as separate from mutation safety.

Scope

  • Expand the golden dataset with clarification and safety/refusal cases.
  • Add Microsoft.Extensions.AI.Evaluation or a documented fallback integrated with dotnet test.
  • Add prompt regression with promptfoo using --no-share and PR-visible diffs.
  • Add a WireMock.Net MITM integration test for the full capture -> proposal -> agent flow that fails on outbound hosts outside EgressEnvelope.
  • Build the disclosure registry/source-generation path for Settings -> Where your data goes.
  • Add TelemetryGuard.Validate at emit and export boundaries with allowlist and fuzz rejection tests.
  • Add local Insights metrics for proposal acceptance/edit/reject cohorts without storing user content.

Acceptance Criteria

  • Eval harness includes happy-path, clarification, refusal, safety, and prompt-injection cases.
  • Prompt regression runs locally and in CI without sharing data externally.
  • Egress MITM test fails on an attempted attacker host and passes for configured envelope entries.
  • Where-your-data-goes registry enumerates every outbound site, payload category, and using tool/agent.
  • Telemetry guard rejects long strings, URLs, email-like strings, unknown keys, and non-finite metrics.
  • Local Insights shows content-free acceptance/edit/reject trends by prompt version.

Suggested Verification

  • dotnet test eval/egress/telemetry filters
  • promptfoo local command with --no-share
  • Frontend tests for Settings disclosure and Insights displays

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Pending

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions