feat: Add QA evaluation structured outputs for Starlight (Brent Council) by roshan-vapi · Pull Request #5 · VapiAI/gitops

roshan-vapi · 2026-02-24T21:34:57Z

Summary

Adds 5 structured output YAML files for automated post-call QA evaluation of Brent Council Housing Benefits calls (Starlight project).

4 QA category structured outputs that evaluate call transcripts against Brent Council's manual QA criteria
1 wrap-up code structured output that classifies calls into 19 predefined categories

Linear Issue

PRO-846

Files Created

File	Category	Questions	Auto-Fail
`resources/structuredOutputs/starlight-qa-engagement.yml`	Engagement	7 (1.1-1.7)	1.3, 1.4, 1.5
`resources/structuredOutputs/starlight-qa-right-first-time.yml`	Right First Time	8 (2.1-2.8)	2.3, 2.4, 2.5
`resources/structuredOutputs/starlight-qa-signposting.yml`	Signposting	2 (3.1-3.2)	None
`resources/structuredOutputs/starlight-qa-explaining.yml`	Explaining	2 (4.1-4.2)	None
`resources/structuredOutputs/starlight-wrap-up-code.yml`	Call Classification	N/A	N/A

Schema Design

Each QA structured output produces per-question evaluations with:

result: yes / no / not_applicable
reasoning: explanation referencing the conversation
evidence: array of { message_text, timestamp } excerpts

Top-level fields:

auto_fail: true if ANY auto-fail question received no
overall_pass: true only if auto_fail is false
category_score: fraction string e.g. "5/7"

Auto-fail logic: If any auto-fail question in ANY of the 4 categories receives no, the ENTIRE call evaluation fails. Each structured output sets its own auto_fail flag; the consuming application must check across all 4.

Key Design Decisions

Model: gpt-4.1 at temperature: 0 for deterministic, accurate QA evaluation
Multilingual support: All outputs include explicit instructions to evaluate in transcript language
AI agent adaptation: Questions that don't apply to AI agents (ACW, system logging, hold time) have not_applicable guidance
Glossary: Full Brent Council Housing Benefits terminology embedded in each output's description
assistant_ids: []: Empty because Starlight assistant configs are not yet in the gitops repo; will be populated when they are added
Wrap-up code second-tier: Placeholder secondary_classification_notes field for pending tier definitions

Line Count Note

This PR is 778 lines, which exceeds the 500-line guideline. However, all additions are declarative YAML data files with repetitive per-question schema structure. The 5 files are logically atomic units that cannot be meaningfully split -- each represents a single structured output definition. No code was modified.

How to Test

Verify YAML validity: each file parses correctly with the yaml npm package
Verify schema.type is always a simple string (not an array) per AGENTS.md warning
After push to Vapi (npm run push:dev), verify structured outputs appear in the dashboard
Run a test call and verify the structured outputs produce expected evaluation results

Validation

All 5 files validated as correct YAML with required fields (name, type, target, description, model, schema, assistant_ids, workflow_ids)
schema.type confirmed as simple string "object" in all files (avoids .toLowerCase() crash)
All question properties validated to have result, reasoning, and evidence sub-properties
name fields follow snake_case convention per AGENTS.md

Add 5 structured output YAML files for automated post-call QA evaluation of Brent Council Housing Benefits calls: - starlight-qa-engagement.yml: 7 questions (3 auto-fail: 1.3, 1.4, 1.5) - starlight-qa-right-first-time.yml: 8 questions (3 auto-fail: 2.3, 2.4, 2.5) - starlight-qa-signposting.yml: 2 questions (no auto-fail) - starlight-qa-explaining.yml: 2 questions (no auto-fail) - starlight-wrap-up-code.yml: call classification into 19 wrap-up codes Each QA structured output evaluates per-question with result (yes/no/not_applicable), reasoning, and transcript evidence. Auto-fail logic: if ANY auto-fail question receives "no", the entire evaluation fails across all categories. All outputs include multilingual transcript support, AI agent adaptation notes, and the full Brent Council Housing Benefits glossary. Closes PRO-846 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vapi-tasker bot added the tasked-to-tasker label Feb 24, 2026

roshan-vapi added the merge-queue label Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Add QA evaluation structured outputs for Starlight (Brent Council)#5

feat: Add QA evaluation structured outputs for Starlight (Brent Council)#5
roshan-vapi wants to merge 1 commit intomainfrom
tasker/PRO-846-qa-structured-outputs

roshan-vapi commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

roshan-vapi commented Feb 24, 2026

Summary

Linear Issue

Files Created

Schema Design

Key Design Decisions

Line Count Note

How to Test

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant