Skip to content

feat: Multilingual Voice Agent MVP, end-to-end implementation for Issue #4#12

Open
DZDasherKTB wants to merge 1 commit into
theapprenticeproject:mainfrom
DZDasherKTB:multilingual_voice_agents
Open

feat: Multilingual Voice Agent MVP, end-to-end implementation for Issue #4#12
DZDasherKTB wants to merge 1 commit into
theapprenticeproject:mainfrom
DZDasherKTB:multilingual_voice_agents

Conversation

@DZDasherKTB
Copy link
Copy Markdown

Overview

This PR delivers a working end-to-end MVP for the multilingual voice agent
described in Issue #4. Every acceptance criterion is implemented and tested.

Closes #4


What's implemented

Voice agent pipeline (end-to-end)

A student goes inactive → dropout risk model scores them → orchestrator
decides channel and contact → VAPI initiates an outbound call in their
language → Didi (the agent persona) speaks using Sarvam AI Indic TTS →
student responds → transcript is analysed → outcome is logged back to
Frappe → if no answer, a WhatsApp voice note is sent automatically.

Language detection (agent/language_detector.py)

Script-level detection distinguishes Devanagari (Hindi/Marathi),
Gurmukhi (Punjabi), and Latin-romanised input. Handles code-switching
mid-sentence (very common: "haan मुझे padhai karni hai"). Marathi vs
Hindi disambiguation uses marker characters unique to Marathi (ळ, ऱ, ॉ).
All three languages route to language-appropriate Sarvam AI TTS voices.

Dropout risk model (nudge/dropout_risk.py)

Logistic regression over 6 LMS behavioral features: days inactive,
course completion %, weekly session decay rate, average session duration,
streak days, and total sessions. Online SGD update after each call outcome,
the model improves as it runs. Falls back to rule-based scoring before
training data accumulates.

Nudge orchestrator (nudge/orchestrator.py)

The central pipeline: fetches student profile + progress from Frappe,
scores risk, applies rate limiting (max 2 calls/week/student via Redis),
determines nudge type (return / lesson completion / streak recovery /
parent awareness / celebration), builds conversation context, fires call,
handles the full fallback chain, and logs everything.

Conversation flow engine (agent/conversation_flow.py)

Structured opening utterances in all 3 languages × 5 nudge types.
LLM system prompt builder keeps Didi on-script with student-specific
context. Fallback cycling, escalation trigger detection, and warm
closings when the student commits to logging in.

Channel fallback chain (telephony/)

voice_call → WhatsApp voice note → WhatsApp text
Voice notes are synthesised via Sarvam AI TTS and sent as WhatsApp audio
messages , 3-5× higher open rate than text for this demographic.

Frappe LMS integration (lms/frappe_client.py)

Async client for TAP's Frappe REST API. Fetches student profile,
learning progress, and at-risk student list. Writes nudge records back
so teachers can see call history per student in the LMS.

A/B experimentation framework (experiments/framework.py)

Deterministic arm assignment via MD5 hash, same student always lands in
the same arm. Arms: control / voice_call / whatsapp_voice / whatsapp_text.
Primary metric: return-to-platform within 72 hours (tracked via Frappe
login webhook). Reports lift over control per arm.

Automated scheduling (main.py)

APScheduler runs batch nudges at 10:00 AM and 6:00 PM IST daily.
REST API allows manual trigger per student or full batch. VAPI and
Frappe webhooks handle real-time call events and platform return signals.


Innovations beyond the spec

Sarvam AI over Whisper : Sarvam is trained on Indian telephony audio
(noisy, accented, code-switched). For government school calls this matters
far more than Whisper which is trained on clean internet audio.

Conversational memory across calls : Redis stores the last 10 call
outcomes per student. If math nudges haven't worked twice, the system
switches topic automatically based on what the student previously engaged
with.

Parent vs student routing : Grade ≤ 6 or hour ≥ 6 PM → call parent
with a different script and different ask. The agent knows it's talking
to a parent and adjusts tone, vocabulary, and request accordingly.

Online model learning : Every call outcome updates the dropout risk
model weights via SGD. The system gets smarter at predicting who to call
as outcome data accumulates, without requiring full retraining.

Celebration nudges : Not just re-engagement. When a student hits a
milestone, Didi calls to celebrate. Positive reinforcement reduces future
dropout probability.


How to run

cd voice-agent
pip install -r requirements.txt
cp .env.example .env   # fill in API keys
docker run -d -p 6379:6379 redis:alpine
python main.py
# Health check: curl http://localhost:8000/api/health

How to test (no external services needed)

cd voice-agent
python -m pytest tests/test_suite.py -v
# Expected: 43 passed

File structure

voice-agent/
├── main.py # FastAPI + webhooks + APScheduler
├── config.py # All config in one place
├── agent/
│ ├── language_detector.py # Script-aware language detection
│ └── conversation_flow.py # Dialogue trees + LLM prompts
├── lms/
│ └── frappe_client.py # TAP Frappe REST client
├── nudge/
│ ├── dropout_risk.py # Logistic regression risk scorer
│ └── orchestrator.py # Master nudge pipeline
├── telephony/
│ ├── vapi_client.py # VAPI outbound call client
│ └── whatsapp_client.py # WhatsApp voice note fallback
├── experiments/
│ └── framework.py # A/B experiment framework
├── tests/
│ └── test_suite.py # 43 unit tests
├── requirements.txt
├── .env.example
└── README.md


Acceptance criteria check

Criterion Status
Multilingual agent handles end-to-end spoken conversations ✅ VAPI + Sarvam AI + conversation_flow.py
Integrated with TAP LMS for learner context ✅ frappe_client.py - profile, progress, nudge log
Automated outbound calling ✅ vapi_client.py + APScheduler batch runs
Language detection routes to correct language ✅ language_detector.py - all 3 languages
WhatsApp voice integration ✅ whatsapp_client.py with Sarvam TTS
Fallback and escalation logic ✅ conversation_flow.py + orchestrator fallback chain
Experimentation framework ✅ experiments/framework.py - A/B + metrics
Documentation for scaling ✅ README.md

Author

Dashpreet Singh
dashpreetsinghhanda@gmail.com | 2024ucs0087@iitjammu.ac.in
IIT Jammu, B.Tech CSE 2024–2028

End-to-end implementation for Issue theapprenticeproject#4.

- Language detection: Hindi/Marathi/Punjabi (script + romanised + code-switch)
- Dropout risk model: logistic regression over 6 LMS behavioral signals
- VAPI outbound call orchestration with Sarvam AI Indic STT/TTS
- WhatsApp voice note fallback chain when calls not answered
- Frappe LMS integration: student profile, progress, nudge log writebacks
- A/B experimentation framework: voice vs WhatsApp vs control
- APScheduler: automated batch nudges at 10AM and 6PM IST
- 43 unit tests, no external services required

Closes theapprenticeproject#4
@DZDasherKTB
Copy link
Copy Markdown
Author

Known Issue Identified

Sarvam streaming STT requires sample_rate set explicitly in both the
WebSocket connection and audio data parameters. Mismatch causes silent
WER degradation with no error thrown. Fix in progress, switching to
Saaras v3 with mode="transcribe" as Saarika v2.5 is being deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DMP 2026]: Building Multilingual Voice Agents to Improve Learning Engagement in Government School Systems

1 participant