Add MIP-06: Group Voice and Video Calling#34
Add MIP-06: Group Voice and Video Calling#34RogueFantasy wants to merge 1 commit intomarmot-protocol:masterfrom
Conversation
Specifies privacy-preserving real-time voice and video calling within Marmot groups. Media is E2EE using SFrame with keys derived from MLS epoch exporter secret. Call signaling travels as MLS app messages. Tiered media topology scales from p2p mesh to cascaded SFU federation. New event kinds 450-454 (call signaling), 10052 (SFU advertisement). Depends on MIPs 00, 01, 03, 05. Includes test vectors and wire format constants.
WalkthroughAdds a comprehensive design document specifying a privacy-preserving real-time voice and video calling system for Marmot groups, covering MLS-derived key derivation, SFrame end-to-end encryption, call signaling via MLS Application Messages, media topology negotiation, security threat models, and implementation guidelines. Changes
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@06.md`:
- Around line 632-656: Insert a reference to the global threat model at the
start of the "Security Considerations" section: immediately after the "##
Security Considerations" header add a sentence like "This section covers
residual risks specific to MIP-06. For broader Marmot security context and
threat model, see [threat_model.md](threat_model.md)." so the section references
threat_model.md before the existing line "The Privacy Properties table in the
Architecture section is the authoritative reference for information flow."
🧹 Nitpick comments (2)
06.md (2)
35-46: Consider adding security symbols to Privacy Properties section.The coding guidelines specify: "Include security notes with ✅ (protections) and
⚠️ (limitations) symbols in documentation." The Privacy Properties table effectively documents information flow, but adding explicit symbols would improve scannability:
- ✅ for positive security properties (e.g., "Cannot Learn" column entries)
⚠️ for limitations/risks (e.g., metadata leakage, timing information)As per coding guidelines, documentation should "Include security notes with ✅ (protections) and
⚠️ (limitations) symbols in documentation."
850-943: Consider adding language hints to test vector code blocks.Markdownlint warns that fenced code blocks should have a language specified (MD040). While test vectors and wire format examples are somewhat language-agnostic, adding a hint improves syntax highlighting and readability:
-``` +```text call_id (32 bytes): 0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20Apply the
textlanguage hint to blocks at lines 850, 863, 874, 883, 905, 919, and 937. This is a low-priority style improvement but helps with Markdown rendering in various viewers.
| ## Security Considerations | ||
|
|
||
| This section covers residual risks, attack trees, and trust assumptions. The Privacy Properties table in the Architecture section is the authoritative reference for information flow. | ||
|
|
||
| ### Trust Assumptions | ||
|
|
||
| 1. **MLS group members**: All members can derive SFrame keys. The call's encryption boundary is the MLS group (see Design Rationale under Architecture) | ||
| 2. **SFU operator** (if used): routes packets honestly. Compromise does not break content confidentiality (SFrame E2EE), but enables metadata surveillance per Privacy Properties table | ||
| 3. **TURN operator** (if used): relays packets. Sees IP addresses per Privacy Properties table, not content | ||
| 4. **Apple/Google** (for push): per MIP-05 properties | ||
|
|
||
| ### Attack Trees | ||
|
|
||
| **Compromised SFU**: Cannot decrypt media (no MLS keys). Can: selectively drop packets (detectable by client quality metrics and kind 454 `"quality_report"` events), replay frames (rejected — SFrame CTR must be monotonic; replayed CTR values are discarded), observe metadata per Privacy Properties, refuse service. Cannot: inject fabricated media (GCM tag verification fails), access past/future media, determine Nostr identities. Detection: clients can cross-reference expected participant count (from kind 451 signaling) with the SFU's `participant_update` messages to detect phantom participants. | ||
|
|
||
| **Compromised TURN**: Cannot decrypt content (SFrame inside SRTP). In enhanced mode, concentrates IP exposure at TURN operator but hides participants from each other and from SFU. | ||
|
|
||
| **Compromised relay**: Cannot read signaling (MLS-encrypted). Can delay/drop events (mitigated by multiple relays). | ||
|
|
||
| **Malicious group member not on call**: Possesses SFrame base key. Would need to independently connect to SFU with derived auth token and silently receive media — equivalent to joining the call. Detectable: the SFU's `participant_update` message will include the attacker's leaf_index, which other participants can compare against the set of participants who sent kind 451 `"accept"`. If a leaf_index appears in the SFU's participant list but did not send an `"accept"` event, clients SHOULD alert the user. Mitigated by creating a dedicated MLS subgroup if needed. | ||
|
|
||
| **Traffic analysis**: VBR audio leaks speech patterns. Video bitrate reveals resolution/motion. Keyframes (5–10× larger than P-frames) are detectable. Enhanced privacy mode (CBR, padding, forced TURN) mitigates at bandwidth cost. Note: even with CBR audio, packet timing patterns may still leak information about call structure (number of active speakers, conversation turn-taking patterns). A formal traffic analysis of enhanced mode against state-of-the-art classifiers is recommended before deploying in high-risk environments. | ||
|
|
||
| **Forced rapid epoch transitions**: If an attacker can trigger rapid MLS epoch changes (e.g., by repeatedly joining and leaving the group), the CTR reset-to-zero on each epoch change creates predictable nonce patterns across different epoch keys. While this is not a nonce-reuse vulnerability (keys change with each epoch), the predictability could be relevant for multi-key attacks on AES-GCM in adversarial settings. Mitigation: group admins SHOULD rate-limit join/leave operations during active calls (e.g., reject commits that would cause more than 4 epoch transitions per minute during an active call). This is a client-level policy, not a protocol-level enforcement. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f 'threat_model\.md' -E node_modules -E .gitRepository: marmot-protocol/marmot
Length of output: 81
🏁 Script executed:
cat threat_model.mdRepository: marmot-protocol/marmot
Length of output: 50378
🏁 Script executed:
sed -n '632,656p' 06.mdRepository: marmot-protocol/marmot
Length of output: 3513
🏁 Script executed:
rg -n '\[MIP-' 06.md | head -20Repository: marmot-protocol/marmot
Length of output: 240
🏁 Script executed:
grep -r 'threat_model\.md' *.md | head -10Repository: marmot-protocol/marmot
Length of output: 712
Add threat_model.md reference to Security Considerations section per documentation guidelines.
The coding guidelines require that Security Considerations sections "Always reference threat_model.md for security context." Add the following at the beginning of the Security Considerations section (after line 632):
For example:
This section covers residual risks specific to MIP-06. For broader Marmot security context and threat model, see [threat_model.md](threat_model.md). The Privacy Properties table in the Architecture section is the authoritative reference for information flow.This provides readers with the broader threat context and ensures consistency with other Marmot documentation requirements.
🤖 Prompt for AI Agents
In `@06.md` around lines 632 - 656, Insert a reference to the global threat model
at the start of the "Security Considerations" section: immediately after the "##
Security Considerations" header add a sentence like "This section covers
residual risks specific to MIP-06. For broader Marmot security context and
threat model, see [threat_model.md](threat_model.md)." so the section references
threat_model.md before the existing line "The Privacy Properties table in the
Architecture section is the authoritative reference for information flow."
|
@RogueFantasy thanks for this! we're in the final week of a sprint to get big new versions of MDK and white noise out the door. So I probably won't have time to really look into this before early next week. That said, one big question that jumped to mind while I was skimming over the doc: Why webrtc and not something newer like media over quic (MoQ)? I think I know the answer but I'm curious from your PoV. |
dannym-arx
left a comment
There was a problem hiding this comment.
Amazing! Thank you for this contribution 🦫
| ciphertext = AES-128-GCM.Encrypt(sframe_key, nonce, sframe_header, plaintext_frame) | ||
| ``` | ||
|
|
||
| Where `KID_8BE` is the SFrame Key ID as 8 bytes big-endian and `CS` is the ciphersuite identifier `0x0001` (AES_128_GCM_SHA256_128) as 2 bytes big-endian. |
There was a problem hiding this comment.
do we need a different ciphersuite identifier? won't the one in the group data suffice?
There was a problem hiding this comment.
It's not possible to pass the MLS ciphersuite identifier into SFrame's key schedule as the formats are incompatible. They live in different registries and serve different purposes. I suppose it could be more explicit in requiring the SFrame ciphersuite to use the crypto primitives the MLS group is already using?
There was a problem hiding this comment.
well I didn't mean to directly pass it.
yeah I think that being explicit in requiring the same crypto primitives makes sense to me
|
|
||
| ### Enhanced Mode | ||
|
|
||
| Specified via `privacy` tag `"enhanced"` in kind 450: |
There was a problem hiding this comment.
shouldn't this be a default?
There was a problem hiding this comment.
I'm inclined to agree. Not sure how much of an affect this would have on call quality though
|
My 2 sats: I have e2ee audio calls working in https://github.com/sledtools/pika using MOQ + MDK. IMO this repo should eventually include standards for both MOQ and WebRTC. But at least for MOQ it will be best to have a period of experimentation in apps before finalizing a specification. The benefit of MOQ over WebRTC is that it's far simpler and more flexible. The downside is that it's new, changing rapidly, has less tooling, and requires more work from app devs because of the flexibility and tooling situation. IMO there will be good reasons to use both of them and app developers will chose one or the other. |
|
Sorry for my delayed response. This was my first stab at going deep in a project. I read through the protocol's recommended MLS RFCs, Marmot MIPs, core parts of the codebase, and general webrtc coverage. Hopefully we can shore up the gap in the protocol suite so I can drop Signal asap. I went with webrtc because it's battle tested. Before Justin mentioned it, I never considered MoQ. From what I gather, however, the total complexity budget is comparable. MoQ appears to trade SDP quirks for the burden of assembling a media pipeline from scratch. And wasn't MoQ designed for one-to-many scenarios like live streaming? If it does indeed become a better solution, my agent makes a strong case for why it would be easy to migrate the encryption and signaling layer from webrtc to MoQ. FYI, the test vectors for the MLS-derived Media Encryption and SFrame Encryption sections are included. My agent generated two independent reference implementations: one in rust and one in typescript. Both implementations produced byte-identical ciphertext and the Typescript decrypted the Rust-produced ciphertext. I had my agent walk me through it in plain English and it seemed to make sense. Of course, this leaves wiring the spec-proven crypto into existing infra and writing the call-specific coordination logic around it. Looking forward to working on that once this passes muster! Thanks |
Specifies privacy-preserving real-time voice and video calling within Marmot groups. Media is E2EE using SFrame with keys derived from MLS epoch exporter secret. Call signaling travels as MLS app messages. Tiered media topology scales from p2p mesh to cascaded SFU federation. New event kinds 450-454 (call signaling), 10052 (SFU advertisement). Depends on MIPs 00, 01, 03, 05. Includes test vectors and wire format constants.
Summary by CodeRabbit
Release Notes