feat: Multilingual Voice Agent MVP, end-to-end implementation for Issue #4#12
Open
DZDasherKTB wants to merge 1 commit into
Open
feat: Multilingual Voice Agent MVP, end-to-end implementation for Issue #4#12DZDasherKTB wants to merge 1 commit into
DZDasherKTB wants to merge 1 commit into
Conversation
End-to-end implementation for Issue theapprenticeproject#4. - Language detection: Hindi/Marathi/Punjabi (script + romanised + code-switch) - Dropout risk model: logistic regression over 6 LMS behavioral signals - VAPI outbound call orchestration with Sarvam AI Indic STT/TTS - WhatsApp voice note fallback chain when calls not answered - Frappe LMS integration: student profile, progress, nudge log writebacks - A/B experimentation framework: voice vs WhatsApp vs control - APScheduler: automated batch nudges at 10AM and 6PM IST - 43 unit tests, no external services required Closes theapprenticeproject#4
Author
Known Issue IdentifiedSarvam streaming STT requires sample_rate set explicitly in both the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR delivers a working end-to-end MVP for the multilingual voice agent
described in Issue #4. Every acceptance criterion is implemented and tested.
Closes #4
What's implemented
Voice agent pipeline (end-to-end)
A student goes inactive → dropout risk model scores them → orchestrator
decides channel and contact → VAPI initiates an outbound call in their
language → Didi (the agent persona) speaks using Sarvam AI Indic TTS →
student responds → transcript is analysed → outcome is logged back to
Frappe → if no answer, a WhatsApp voice note is sent automatically.
Language detection (
agent/language_detector.py)Script-level detection distinguishes Devanagari (Hindi/Marathi),
Gurmukhi (Punjabi), and Latin-romanised input. Handles code-switching
mid-sentence (very common: "haan मुझे padhai karni hai"). Marathi vs
Hindi disambiguation uses marker characters unique to Marathi (ळ, ऱ, ॉ).
All three languages route to language-appropriate Sarvam AI TTS voices.
Dropout risk model (
nudge/dropout_risk.py)Logistic regression over 6 LMS behavioral features: days inactive,
course completion %, weekly session decay rate, average session duration,
streak days, and total sessions. Online SGD update after each call outcome,
the model improves as it runs. Falls back to rule-based scoring before
training data accumulates.
Nudge orchestrator (
nudge/orchestrator.py)The central pipeline: fetches student profile + progress from Frappe,
scores risk, applies rate limiting (max 2 calls/week/student via Redis),
determines nudge type (return / lesson completion / streak recovery /
parent awareness / celebration), builds conversation context, fires call,
handles the full fallback chain, and logs everything.
Conversation flow engine (
agent/conversation_flow.py)Structured opening utterances in all 3 languages × 5 nudge types.
LLM system prompt builder keeps Didi on-script with student-specific
context. Fallback cycling, escalation trigger detection, and warm
closings when the student commits to logging in.
Channel fallback chain (
telephony/)voice_call → WhatsApp voice note → WhatsApp textVoice notes are synthesised via Sarvam AI TTS and sent as WhatsApp audio
messages , 3-5× higher open rate than text for this demographic.
Frappe LMS integration (
lms/frappe_client.py)Async client for TAP's Frappe REST API. Fetches student profile,
learning progress, and at-risk student list. Writes nudge records back
so teachers can see call history per student in the LMS.
A/B experimentation framework (
experiments/framework.py)Deterministic arm assignment via MD5 hash, same student always lands in
the same arm. Arms: control / voice_call / whatsapp_voice / whatsapp_text.
Primary metric: return-to-platform within 72 hours (tracked via Frappe
login webhook). Reports lift over control per arm.
Automated scheduling (
main.py)APScheduler runs batch nudges at 10:00 AM and 6:00 PM IST daily.
REST API allows manual trigger per student or full batch. VAPI and
Frappe webhooks handle real-time call events and platform return signals.
Innovations beyond the spec
Sarvam AI over Whisper : Sarvam is trained on Indian telephony audio
(noisy, accented, code-switched). For government school calls this matters
far more than Whisper which is trained on clean internet audio.
Conversational memory across calls : Redis stores the last 10 call
outcomes per student. If math nudges haven't worked twice, the system
switches topic automatically based on what the student previously engaged
with.
Parent vs student routing : Grade ≤ 6 or hour ≥ 6 PM → call parent
with a different script and different ask. The agent knows it's talking
to a parent and adjusts tone, vocabulary, and request accordingly.
Online model learning : Every call outcome updates the dropout risk
model weights via SGD. The system gets smarter at predicting who to call
as outcome data accumulates, without requiring full retraining.
Celebration nudges : Not just re-engagement. When a student hits a
milestone, Didi calls to celebrate. Positive reinforcement reduces future
dropout probability.
How to run
How to test (no external services needed)
File structure
voice-agent/
├── main.py # FastAPI + webhooks + APScheduler
├── config.py # All config in one place
├── agent/
│ ├── language_detector.py # Script-aware language detection
│ └── conversation_flow.py # Dialogue trees + LLM prompts
├── lms/
│ └── frappe_client.py # TAP Frappe REST client
├── nudge/
│ ├── dropout_risk.py # Logistic regression risk scorer
│ └── orchestrator.py # Master nudge pipeline
├── telephony/
│ ├── vapi_client.py # VAPI outbound call client
│ └── whatsapp_client.py # WhatsApp voice note fallback
├── experiments/
│ └── framework.py # A/B experiment framework
├── tests/
│ └── test_suite.py # 43 unit tests
├── requirements.txt
├── .env.example
└── README.md
Acceptance criteria check
Author
Dashpreet Singh
dashpreetsinghhanda@gmail.com | 2024ucs0087@iitjammu.ac.in
IIT Jammu, B.Tech CSE 2024–2028