fix: add provenance header and speaker IDs to Slack transcript imports by Kesshite · Pull Request #815 · MemPalace/mempalace

Kesshite · 2026-04-13T17:37:42Z

Summary

Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's own words in all future retrieval (data poisoning via stored memory).

This PR adds two mitigations while preserving compatibility with the existing exchange-pair chunking pipeline:

Provenance header at the top of every Slack transcript: [source: slack-export | multi-party chat — speaker roles are positional, not verified]
Speaker ID prefix on every message: [U1] Hello instead of just Hello

Role alternation (user/assistant) is preserved so convo_miner.py's > marker-based chunking continues to work — but now every message carries its original speaker identity and the transcript is clearly marked as multi-party with unverified roles.

Refs: #809 (Finding 6)

What changed

mempalace/normalize.py

_try_slack_json() — provenance header prepended to output; each message prefixed with [speaker_id]

tests/test_normalize.py

test_slack_json_has_provenance_header — verifies header presence and keywords
test_slack_json_preserves_speaker_id — verifies [U1], [U2] in output
test_slack_json_attacker_first_message_attributed — verifies attacker's ID is visible even when placed first

Test plan

pytest tests/test_normalize.py -v — 91/91 passed
pytest tests/ -v --ignore=tests/benchmarks — 689 passed, 2 failed (pre-existing version mismatch)
ruff check — all checks passed
ruff format --check — all files formatted
No new dependencies added
Exchange-pair chunking preserved (user/assistant alternation unchanged)

Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's words in all future retrieval (data poisoning via stored memory). Changes: - Add provenance header marking Slack transcripts as multi-party with positional (unverified) role assignment - Prefix each message with the original speaker ID ([U1], [U2], etc.) so downstream consumers can distinguish authors - Keep user/assistant role alternation for exchange-pair chunking compatibility with convo_miner.py Tests: - Provenance header presence and content - Speaker ID preservation in output - Attacker-first-message attribution verification Refs: MemPalace#809

…onstant - Move provenance notice from header to footer to prevent it becoming a standalone ChromaDB drawer via paragraph chunking on exports with fewer than 3 exchange pairs (violates verbatim-always principle) - Sanitize speaker user_id/username: strip brackets, newlines, and control characters to prevent chunk-boundary injection via crafted Slack exports - Extract header string to _SLACK_PROVENANCE_FOOTER module constant, consistent with _TOOL_RESULT_* constants pattern; tests import it instead of duplicating the literal Refs: MemPalace#809

bensig

Code reviewed — no issues found. CLAUDE.md compliance verified.

Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only default, #866 init .gitignore, #864 MCP stdout redirect, #863 precompact hook, #865 searcher empty results, #831 cold-start palace, #862 init help, #815 Slack provenance, #840 save hook auto-mine. Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.

Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for MemPalace#871 MEMPAL_VERBOSE, MemPalace#811 research() local-only default, MemPalace#866 init .gitignore, MemPalace#864 MCP stdout redirect, MemPalace#863 precompact hook, MemPalace#865 searcher empty results, MemPalace#831 cold-start palace, MemPalace#862 init help, MemPalace#815 Slack provenance, MemPalace#840 save hook auto-mine. Also drops the awkward caveat on MemPalace#846 created_at — it's post-v3.3.0.

Kesshite requested review from bensig, igorls and milla-jovovich as code owners April 13, 2026 17:37

Kesshite mentioned this pull request Apr 13, 2026

feat: mempalace_explain — entity-aware search with KG enrichment (decision archaeology) #822

Open

igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026

Kesshite added 2 commits April 14, 2026 08:04

Kesshite force-pushed the fix/security-normalize-roles branch from a60839f to 2704b15 Compare April 14, 2026 11:05

bensig approved these changes Apr 15, 2026

View reviewed changes

bensig merged commit e61dc2a into MemPalace:develop Apr 15, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add provenance header and speaker IDs to Slack transcript imports#815

fix: add provenance header and speaker IDs to Slack transcript imports#815
bensig merged 2 commits intoMemPalace:developfrom
Kesshite:fix/security-normalize-roles

Kesshite commented Apr 13, 2026

Uh oh!

bensig left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kesshite commented Apr 13, 2026

Summary

What changed

Test plan

Uh oh!

bensig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants