Record outbound phone calls + 2-channel socket#4416
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new multi-channel audio transcription WebSocket endpoint, along with supporting utilities for conversation management, Pusher integration, and translation. The changes include creating new files for streaming utilities and modifying the backend router to handle the new endpoint. A critical issue related to error handling and fallback strategy has been identified and needs to be addressed.
|
/gemini summary |
Summary of ChangesThis pull request adds phone call functionality to the application, leveraging the Twilio Voice SDK for call management and real-time transcription. It incorporates native platform features like CallKit on iOS for a seamless user experience and introduces a new multi-channel audio processing pipeline. The changes include backend endpoints for token generation and phone number verification, as well as Flutter plugins for handling call initiation, control, and audio streaming. Highlights
Changelog
Activity
|
|
@mdmohsin7 The TwiML webhook in Reviewed by @kenji |
|
/gemini summary |
Summary of ChangesThis pull request introduces phone call functionality to the application, leveraging the Twilio Voice SDK for call management and real-time transcription. It incorporates native platform features like CallKit on iOS for a seamless user experience and introduces a new multi-channel audio processing pipeline. The changes include backend endpoints for token generation and phone number verification, as well as Flutter plugins for handling call initiation, control, and audio streaming. Highlights
Changelog
Activity
|
Done, pls check now @beastoin |
PR Review: Record outbound phone calls + 2-channel socketSubstantial feature adding Twilio-based phone call recording with dual-channel audio streaming. Good test coverage included. Issues to Address1. Twilio signature validation bypass when auth_token missing ( if not auth_token:
return True # <-- Security risk: bypasses validationThis silently allows unauthenticated requests when 2. Inconsistent package path ( 3. Missing rate limiting on verification endpoints 4. Phone number validation could be stricter ( Minor Suggestions
Positive Notes
Verdict: Address item #1 (security issue) before merging. Items #2-4 can be follow-up tasks. by AI for @beastoin |
It's com/friend/ios coz the package name is com.friend.ios Rest all fixed, pls check now @beastoin |
|
@mdmohsin7 In Can you update the verification flow and add the missing test? by AI for @beastoin |
|
final fixes done, pls check again @beastoin's AI 🤖 |
|
conflicts @mdmohsin7 |
|
@mdmohsin7 Before review, quick questions:
Thanks! by AI for @beastoin |
|
@mdmohsin7 lets roll this out soon pls |
Update transcript parsing to handle standard segment array format. Reset call state to idle on setup failures so retries work. Move mic permission request before SDK initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required for permission_handler to show the mic permission prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mix both channels sample-by-sample before sending to pusher so the stored audio is a proper mono stream. Connect pusher at TARGET_SAMPLE_RATE (16kHz) to match the resampled audio. Use standard UUID for conversation IDs instead of call_id.
Use openAppSettings() from permission_handler instead of unreliable platform-specific URL schemes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 6d97fc7.
When contacts permission was previously denied, tapping Allow did nothing because iOS won't re-show the prompt. Now checks permission status and opens app settings if denied/permanently denied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The delete button was always targeting the first verified number. Moved delete icon into per-row widget so each row deletes the correct number. Also converted all strings to use l10n.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deployment Steps1. Twilio Account Setup
2. Create Twilio API Key
3. Create TwiML Apps (one per environment)
4. Add Kubernetes SecretsAdd these 5 secrets to both clusters:
5. Deploy
|
# Conflicts: # app/lib/l10n/app_ar.arb # app/lib/l10n/app_bg.arb # app/lib/l10n/app_ca.arb # app/lib/l10n/app_cs.arb # app/lib/l10n/app_da.arb # app/lib/l10n/app_de.arb # app/lib/l10n/app_el.arb # app/lib/l10n/app_en.arb # app/lib/l10n/app_es.arb # app/lib/l10n/app_et.arb # app/lib/l10n/app_fi.arb # app/lib/l10n/app_fr.arb # app/lib/l10n/app_hi.arb # app/lib/l10n/app_hu.arb # app/lib/l10n/app_id.arb # app/lib/l10n/app_it.arb # app/lib/l10n/app_ja.arb # app/lib/l10n/app_ko.arb # app/lib/l10n/app_localizations.dart # app/lib/l10n/app_localizations_ar.dart # app/lib/l10n/app_localizations_bg.dart # app/lib/l10n/app_localizations_ca.dart # app/lib/l10n/app_localizations_cs.dart # app/lib/l10n/app_localizations_da.dart # app/lib/l10n/app_localizations_de.dart # app/lib/l10n/app_localizations_el.dart # app/lib/l10n/app_localizations_en.dart # app/lib/l10n/app_localizations_es.dart # app/lib/l10n/app_localizations_et.dart # app/lib/l10n/app_localizations_fi.dart # app/lib/l10n/app_localizations_fr.dart # app/lib/l10n/app_localizations_hi.dart # app/lib/l10n/app_localizations_hu.dart # app/lib/l10n/app_localizations_id.dart # app/lib/l10n/app_localizations_it.dart # app/lib/l10n/app_localizations_ja.dart # app/lib/l10n/app_localizations_ko.dart # app/lib/l10n/app_localizations_lt.dart # app/lib/l10n/app_localizations_lv.dart # app/lib/l10n/app_localizations_ms.dart # app/lib/l10n/app_localizations_nl.dart # app/lib/l10n/app_localizations_no.dart # app/lib/l10n/app_localizations_pl.dart # app/lib/l10n/app_localizations_pt.dart # app/lib/l10n/app_localizations_ro.dart # app/lib/l10n/app_localizations_ru.dart # app/lib/l10n/app_localizations_sk.dart # app/lib/l10n/app_localizations_sv.dart # app/lib/l10n/app_localizations_th.dart # app/lib/l10n/app_localizations_tr.dart # app/lib/l10n/app_localizations_uk.dart # app/lib/l10n/app_localizations_vi.dart # app/lib/l10n/app_localizations_zh.dart # app/lib/l10n/app_lt.arb # app/lib/l10n/app_lv.arb # app/lib/l10n/app_ms.arb # app/lib/l10n/app_nl.arb # app/lib/l10n/app_no.arb # app/lib/l10n/app_pl.arb # app/lib/l10n/app_pt.arb # app/lib/l10n/app_ro.arb # app/lib/l10n/app_ru.arb # app/lib/l10n/app_sk.arb # app/lib/l10n/app_sv.arb # app/lib/l10n/app_th.arb # app/lib/l10n/app_tr.arb # app/lib/l10n/app_uk.arb # app/lib/l10n/app_vi.arb # app/lib/l10n/app_zh.arb
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## End-to-End Flow
### 1. Phone Number Verification
- User enters phone number → Backend validates E.164 format
- Twilio places automated verification call with 6-digit code
- User answers, enters code → Number stored encrypted in Firestore
(`users/{uid}/phone_numbers/`)
- Security: SHA256 hash for lookups, AES-GCM encryption for storage,
5-min pending verification TTL
### 2. Call Initiation
- App requests Twilio access token (JWT with user's UID as identity)
- Native SDK (TwilioVoice) initialized with token
- Microphone permission requested, call placed with unique `callId`
### 3. Call Routing (TwiML Webhook)
- Twilio invokes `POST /v1/phone/twiml` with request signature
- Backend validates:
- Twilio signature
- User's verified caller ID exists
- Destination number E.164 format
- Caller ID still valid in Twilio
- Returns TwiML: `<Dial callerId="+15551234567">+15559876543</Dial>`
- Twilio bridges call: User ↔ Twilio ↔ Recipient
### 4. Recording & Transcription
#### Audio Capture
- Native SDK captures 2-channel PCM16 @ 48kHz:
- Channel 0x01: User microphone
- Channel 0x02: Remote party audio
- Each audio chunk prefixed with channel byte: `[0x01|0x02][audio_data]`
#### Streaming
- WebSocket to `wss://api/v4/listen?source=phone_call&channels=2`
- Multi-channel support integrated directly into the existing
`/v4/listen` route
- Firebase auth via `get_current_user_uid` dependency
- Auto-reconnect with exponential backoff (1s → 16s, max 5 attempts)
#### Backend Processing
- When `channels >= 2`, activates multi-channel mode within the existing
stream handler
- Routes each channel to its own STT connection
(Deepgram/Soniox/Speechmatics)
- Real-time transcription with speaker labels:
- Channel 1 → `SPEAKER_00` (is_user: true)
- Channel 2 → `SPEAKER_01` (is_user: false)
- Uses standard segment array format (same as all other sources)
- One conversation per call session (no silence-based splitting)
#### Storage
- Raw PCM16 uploaded to GCS every 5s
- Encrypted with user-specific AES-GCM keys
- Channels saved separately for diarization
### 5. Call Termination
- User/remote ends call → Native SDK disconnects
- WebSocket closes → Backend finalizes conversation
- Structured output: title, overview, category, action items
- Saved with `source: phone_call`, standard UUID conversation ID
## Required Environment Variables
The following env vars must be set on the backend for phone call
functionality:
| Variable | Description |
|----------|-------------|
| `TWILIO_ACCOUNT_SID` | Twilio account SID |
| `TWILIO_AUTH_TOKEN` | Twilio auth token (used for request signature
validation) |
| `TWILIO_API_KEY_SID` | Twilio API key SID (used for generating access
tokens) |
| `TWILIO_API_KEY_SECRET` | Twilio API key secret |
| `TWILIO_TWIML_APP_SID` | Twilio TwiML app SID (routes calls to the
`/v1/phone/twiml` webhook) |
| `ENCRYPTION_SECRET` | Secret for AES-GCM encryption of stored phone
numbers |
https://github.com/user-attachments/assets/853ddf22-32ed-4c53-aeb4-7f2ce950f6ac
End-to-End Flow
1. Phone Number Verification
users/{uid}/phone_numbers/)2. Call Initiation
callId3. Call Routing (TwiML Webhook)
POST /v1/phone/twimlwith request signature<Dial callerId="+15551234567">+15559876543</Dial>4. Recording & Transcription
Audio Capture
[0x01|0x02][audio_data]Streaming
wss://api/v4/listen?source=phone_call&channels=2/v4/listenrouteget_current_user_uiddependencyBackend Processing
channels >= 2, activates multi-channel mode within the existing stream handlerSPEAKER_00(is_user: true)SPEAKER_01(is_user: false)Storage
5. Call Termination
source: phone_call, standard UUID conversation IDRequired Environment Variables
The following env vars must be set on the backend for phone call functionality:
TWILIO_ACCOUNT_SIDTWILIO_AUTH_TOKENTWILIO_API_KEY_SIDTWILIO_API_KEY_SECRETTWILIO_TWIML_APP_SID/v1/phone/twimlwebhook)ENCRYPTION_SECRETExternal.Device-trimmed.mp4