Skip to content

Add MIP-06: Group Voice and Video Calling#34

Open
RogueFantasy wants to merge 1 commit intomarmot-protocol:masterfrom
RogueFantasy:RogueFantasy-06.md
Open

Add MIP-06: Group Voice and Video Calling#34
RogueFantasy wants to merge 1 commit intomarmot-protocol:masterfrom
RogueFantasy:RogueFantasy-06.md

Conversation

@RogueFantasy
Copy link
Copy Markdown

@RogueFantasy RogueFantasy commented Feb 15, 2026

Specifies privacy-preserving real-time voice and video calling within Marmot groups. Media is E2EE using SFrame with keys derived from MLS epoch exporter secret. Call signaling travels as MLS app messages. Tiered media topology scales from p2p mesh to cascaded SFU federation. New event kinds 450-454 (call signaling), 10052 (SFU advertisement). Depends on MIPs 00, 01, 03, 05. Includes test vectors and wire format constants.

Summary by CodeRabbit

Release Notes

  • Documentation
    • Added comprehensive design specification for privacy-preserving voice and video calling in Marmot groups, including end-to-end encryption with per-sender keys, flexible media topologies (Mesh, Single SFU, Cascaded SFU), real-time signaling, offline notifications, SFU discovery and authentication, network recovery, and security threat analysis.

Specifies privacy-preserving real-time voice and video calling within Marmot groups. Media is E2EE using SFrame with keys derived from MLS epoch exporter secret. Call signaling travels as MLS app messages. Tiered media topology scales from p2p mesh to cascaded SFU federation. New event kinds 450-454 (call signaling), 10052 (SFU advertisement). Depends on MIPs 00, 01, 03, 05. Includes test vectors and wire format constants.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 15, 2026

Walkthrough

Adds a comprehensive design document specifying a privacy-preserving real-time voice and video calling system for Marmot groups, covering MLS-derived key derivation, SFrame end-to-end encryption, call signaling via MLS Application Messages, media topology negotiation, security threat models, and implementation guidelines.

Changes

Cohort / File(s) Summary
Call System Design Specification
06.md
New 961-line design document detailing MLS-derived key architecture (call\_base\_key, sframe\_base\_key), SFrame encryption per RFC 9605, call signaling as MLS Application Messages (kinds 450–454), media topology tiers (Mesh, Single SFU, Cascaded SFU), privacy properties, threat models, operational considerations, SDP sanitization, ICE behavior, and implementation checklists for clients and SFUs.

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add MIP-06: Group Voice and Video Calling' accurately and concisely summarizes the main change: adding a comprehensive design document (MIP-06) for privacy-preserving group voice and video calling to the Marmot protocol.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into master

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@06.md`:
- Around line 632-656: Insert a reference to the global threat model at the
start of the "Security Considerations" section: immediately after the "##
Security Considerations" header add a sentence like "This section covers
residual risks specific to MIP-06. For broader Marmot security context and
threat model, see [threat_model.md](threat_model.md)." so the section references
threat_model.md before the existing line "The Privacy Properties table in the
Architecture section is the authoritative reference for information flow."
🧹 Nitpick comments (2)
06.md (2)

35-46: Consider adding security symbols to Privacy Properties section.

The coding guidelines specify: "Include security notes with ✅ (protections) and ⚠️ (limitations) symbols in documentation." The Privacy Properties table effectively documents information flow, but adding explicit symbols would improve scannability:

  • ✅ for positive security properties (e.g., "Cannot Learn" column entries)
  • ⚠️ for limitations/risks (e.g., metadata leakage, timing information)

As per coding guidelines, documentation should "Include security notes with ✅ (protections) and ⚠️ (limitations) symbols in documentation."


850-943: Consider adding language hints to test vector code blocks.

Markdownlint warns that fenced code blocks should have a language specified (MD040). While test vectors and wire format examples are somewhat language-agnostic, adding a hint improves syntax highlighting and readability:

-```
+```text
 call_id (32 bytes):
   0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20

Apply the text language hint to blocks at lines 850, 863, 874, 883, 905, 919, and 937. This is a low-priority style improvement but helps with Markdown rendering in various viewers.

Comment thread 06.md
Comment on lines +632 to +656
## Security Considerations

This section covers residual risks, attack trees, and trust assumptions. The Privacy Properties table in the Architecture section is the authoritative reference for information flow.

### Trust Assumptions

1. **MLS group members**: All members can derive SFrame keys. The call's encryption boundary is the MLS group (see Design Rationale under Architecture)
2. **SFU operator** (if used): routes packets honestly. Compromise does not break content confidentiality (SFrame E2EE), but enables metadata surveillance per Privacy Properties table
3. **TURN operator** (if used): relays packets. Sees IP addresses per Privacy Properties table, not content
4. **Apple/Google** (for push): per MIP-05 properties

### Attack Trees

**Compromised SFU**: Cannot decrypt media (no MLS keys). Can: selectively drop packets (detectable by client quality metrics and kind 454 `"quality_report"` events), replay frames (rejected — SFrame CTR must be monotonic; replayed CTR values are discarded), observe metadata per Privacy Properties, refuse service. Cannot: inject fabricated media (GCM tag verification fails), access past/future media, determine Nostr identities. Detection: clients can cross-reference expected participant count (from kind 451 signaling) with the SFU's `participant_update` messages to detect phantom participants.

**Compromised TURN**: Cannot decrypt content (SFrame inside SRTP). In enhanced mode, concentrates IP exposure at TURN operator but hides participants from each other and from SFU.

**Compromised relay**: Cannot read signaling (MLS-encrypted). Can delay/drop events (mitigated by multiple relays).

**Malicious group member not on call**: Possesses SFrame base key. Would need to independently connect to SFU with derived auth token and silently receive media — equivalent to joining the call. Detectable: the SFU's `participant_update` message will include the attacker's leaf_index, which other participants can compare against the set of participants who sent kind 451 `"accept"`. If a leaf_index appears in the SFU's participant list but did not send an `"accept"` event, clients SHOULD alert the user. Mitigated by creating a dedicated MLS subgroup if needed.

**Traffic analysis**: VBR audio leaks speech patterns. Video bitrate reveals resolution/motion. Keyframes (5–10× larger than P-frames) are detectable. Enhanced privacy mode (CBR, padding, forced TURN) mitigates at bandwidth cost. Note: even with CBR audio, packet timing patterns may still leak information about call structure (number of active speakers, conversation turn-taking patterns). A formal traffic analysis of enhanced mode against state-of-the-art classifiers is recommended before deploying in high-risk environments.

**Forced rapid epoch transitions**: If an attacker can trigger rapid MLS epoch changes (e.g., by repeatedly joining and leaving the group), the CTR reset-to-zero on each epoch change creates predictable nonce patterns across different epoch keys. While this is not a nonce-reuse vulnerability (keys change with each epoch), the predictability could be relevant for multi-key attacks on AES-GCM in adversarial settings. Mitigation: group admins SHOULD rate-limit join/leave operations during active calls (e.g., reject commits that would cause more than 4 epoch transitions per minute during an active call). This is a client-level policy, not a protocol-level enforcement.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd -t f 'threat_model\.md' -E node_modules -E .git

Repository: marmot-protocol/marmot

Length of output: 81


🏁 Script executed:

cat threat_model.md

Repository: marmot-protocol/marmot

Length of output: 50378


🏁 Script executed:

sed -n '632,656p' 06.md

Repository: marmot-protocol/marmot

Length of output: 3513


🏁 Script executed:

rg -n '\[MIP-' 06.md | head -20

Repository: marmot-protocol/marmot

Length of output: 240


🏁 Script executed:

grep -r 'threat_model\.md' *.md | head -10

Repository: marmot-protocol/marmot

Length of output: 712


Add threat_model.md reference to Security Considerations section per documentation guidelines.

The coding guidelines require that Security Considerations sections "Always reference threat_model.md for security context." Add the following at the beginning of the Security Considerations section (after line 632):

For example:

This section covers residual risks specific to MIP-06. For broader Marmot security context and threat model, see [threat_model.md](threat_model.md). The Privacy Properties table in the Architecture section is the authoritative reference for information flow.

This provides readers with the broader threat context and ensures consistency with other Marmot documentation requirements.

🤖 Prompt for AI Agents
In `@06.md` around lines 632 - 656, Insert a reference to the global threat model
at the start of the "Security Considerations" section: immediately after the "##
Security Considerations" header add a sentence like "This section covers
residual risks specific to MIP-06. For broader Marmot security context and
threat model, see [threat_model.md](threat_model.md)." so the section references
threat_model.md before the existing line "The Privacy Properties table in the
Architecture section is the authoritative reference for information flow."

@erskingardner
Copy link
Copy Markdown
Member

@RogueFantasy thanks for this! we're in the final week of a sprint to get big new versions of MDK and white noise out the door. So I probably won't have time to really look into this before early next week. That said, one big question that jumped to mind while I was skimming over the doc: Why webrtc and not something newer like media over quic (MoQ)? I think I know the answer but I'm curious from your PoV.

Copy link
Copy Markdown
Contributor

@dannym-arx dannym-arx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you for this contribution 🦫

Comment thread 06.md
ciphertext = AES-128-GCM.Encrypt(sframe_key, nonce, sframe_header, plaintext_frame)
```

Where `KID_8BE` is the SFrame Key ID as 8 bytes big-endian and `CS` is the ciphersuite identifier `0x0001` (AES_128_GCM_SHA256_128) as 2 bytes big-endian.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a different ciphersuite identifier? won't the one in the group data suffice?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to pass the MLS ciphersuite identifier into SFrame's key schedule as the formats are incompatible. They live in different registries and serve different purposes. I suppose it could be more explicit in requiring the SFrame ciphersuite to use the crypto primitives the MLS group is already using?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I didn't mean to directly pass it.

yeah I think that being explicit in requiring the same crypto primitives makes sense to me

Comment thread 06.md

### Enhanced Mode

Specified via `privacy` tag `"enhanced"` in kind 450:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a default?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree. Not sure how much of an affect this would have on call quality though

@justinmoon
Copy link
Copy Markdown

justinmoon commented Feb 16, 2026

My 2 sats:

I have e2ee audio calls working in https://github.com/sledtools/pika using MOQ + MDK. IMO this repo should eventually include standards for both MOQ and WebRTC. But at least for MOQ it will be best to have a period of experimentation in apps before finalizing a specification.

The benefit of MOQ over WebRTC is that it's far simpler and more flexible. The downside is that it's new, changing rapidly, has less tooling, and requires more work from app devs because of the flexibility and tooling situation. IMO there will be good reasons to use both of them and app developers will chose one or the other.

@RogueFantasy
Copy link
Copy Markdown
Author

Sorry for my delayed response. This was my first stab at going deep in a project. I read through the protocol's recommended MLS RFCs, Marmot MIPs, core parts of the codebase, and general webrtc coverage. Hopefully we can shore up the gap in the protocol suite so I can drop Signal asap.

I went with webrtc because it's battle tested. Before Justin mentioned it, I never considered MoQ. From what I gather, however, the total complexity budget is comparable. MoQ appears to trade SDP quirks for the burden of assembling a media pipeline from scratch. And wasn't MoQ designed for one-to-many scenarios like live streaming? If it does indeed become a better solution, my agent makes a strong case for why it would be easy to migrate the encryption and signaling layer from webrtc to MoQ.

FYI, the test vectors for the MLS-derived Media Encryption and SFrame Encryption sections are included. My agent generated two independent reference implementations: one in rust and one in typescript. Both implementations produced byte-identical ciphertext and the Typescript decrypted the Rust-produced ciphertext. I had my agent walk me through it in plain English and it seemed to make sense.

Of course, this leaves wiring the spec-proven crypto into existing infra and writing the call-specific coordination logic around it. Looking forward to working on that once this passes muster! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants