AI Utils

A collection of AI testing and utility tools for conversational AI research, built with Bun and TypeScript. Focused on latency reduction, filler generation, and semantic stability for voice-based AI agents.

Setup

Install Bun:

curl -fsSL https://bun.sh/install | bash

Install dependencies:

bun install

Configure environment:

cp .env.example .env
# Edit .env and add your ANTHROPIC_TEST_API_KEY

For AWS Bedrock testing, ensure you're logged into AWS SSO:

aws sso login --sso-session sso-main

Tools

Latency Tester (`tools/latency-tester/`)

Compares latency between AWS Bedrock and direct Anthropic API calls.

Side-by-side Bedrock vs Anthropic direct API comparison
Measures latency, TTFT (Time To First Token), token usage, cache performance
Configurable test scenarios via JSON
CSV output for analysis

bun run latency-test                # Default test suite
bun run latency-test:mobilede       # MobileDE (German customer service) profile

Variable	Default	Description
`ITERATIONS`	10	Number of test iterations
`DELAY_MS`	30000	Delay between iterations (ms)
`SCENARIO_FILE`	`scenario.json`	Scenario file to use
`ANTHROPIC_TEST_API_KEY`	-	Anthropic API key
`AWS_PROFILE`	`sso-qa02-admin`	AWS SSO profile

CAI Filler Test Rig (`tools/cai-filler-test-rig/`)

Tests latency reduction strategies for conversational AI voice agents by generating contextual filler responses while the reasoning LLM processes.

Multiple filler strategies: template, dynamic, intent-based, opening sentence
Speech act classification for context-aware filler selection
Coherence scoring against reasoning LLM output
YAML-based test configuration

bun run filler-test                 # Run with default config
bun run filler-test:example         # Run example config
bun run filler-test -- --config tools/cai-filler-test-rig/config/test-speech-act.yaml

Key docs: docs/CURRENT-STATE.md, docs/TUNING-GUIDE.md

Speculative Handoff (`tools/semantic-stability-tester/`)

Evaluates strategies for detecting whether an extended utterance has changed meaning compared to an earlier interim version. Powers the speculative handoff pipeline - starting LLM generation before the user finishes speaking, then verifying the meaning hasn't shifted.

Three-phase pipeline:

Handoff-Point Detection - identify when enough semantic content exists to start LLM generation
Post-Handoff Monitoring - watch for meaning shifts as the user continues speaking
End-of-Turn Stability Check - final verification before sending the response

bun run stability-test              # Run all strategies on full corpus
bun run stability-test:heuristic    # Heuristic only (fast, no model downloads)
bun run fire-point-test             # Run fire-point detection scenarios
bun run fire-point-report           # Generate HTML scenario report
bun run tools/semantic-stability-tester/fire-point-corpus-report.ts  # Full corpus HTML report

Key docs: docs/ARCHITECTURE.md, results/REPORT.md

Project Structure

ai-utils/
├── lib/                              # Shared libraries
│   ├── types.ts                     # Common TypeScript interfaces
│   ├── csv-writer.ts                # CSV output utilities
│   ├── aws-auth.ts                  # AWS SSO authentication
│   ├── scenario-loader.ts           # JSON scenario/template loader
│   ├── bedrock-client.ts            # AWS Bedrock streaming client
│   ├── reasoning-bedrock-client.ts  # Bedrock client for reasoning models
│   ├── nova-client.ts               # AWS Nova client
│   └── anthropic-client.ts          # Anthropic API client
└── tools/
    ├── latency-tester/              # API latency comparison
    ├── cai-filler-test-rig/         # Filler strategy testing
    └── semantic-stability-tester/   # Speculative handoff

Adding New Tools

Create a new directory under tools/
Import shared libraries from lib/ using relative imports
Add npm scripts to package.json
Put results in tools/<tool-name>/results/ (gitignored)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
lib		lib
tools		tools
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Utils

Setup

Tools

Latency Tester (`tools/latency-tester/`)

CAI Filler Test Rig (`tools/cai-filler-test-rig/`)

Speculative Handoff (`tools/semantic-stability-tester/`)

Project Structure

Adding New Tools

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Utils

Setup

Tools

Latency Tester (tools/latency-tester/)

CAI Filler Test Rig (tools/cai-filler-test-rig/)

Speculative Handoff (tools/semantic-stability-tester/)

Project Structure

Adding New Tools

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Latency Tester (`tools/latency-tester/`)

CAI Filler Test Rig (`tools/cai-filler-test-rig/`)

Speculative Handoff (`tools/semantic-stability-tester/`)

Packages