fix: add provenance header and speaker IDs to Slack transcript imports#815
Merged
bensig merged 2 commits intoMemPalace:developfrom Apr 15, 2026
Merged
Conversation
Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's words in all future retrieval (data poisoning via stored memory). Changes: - Add provenance header marking Slack transcripts as multi-party with positional (unverified) role assignment - Prefix each message with the original speaker ID ([U1], [U2], etc.) so downstream consumers can distinguish authors - Keep user/assistant role alternation for exchange-pair chunking compatibility with convo_miner.py Tests: - Provenance header presence and content - Speaker ID preservation in output - Attacker-first-message attribution verification Refs: MemPalace#809
…onstant - Move provenance notice from header to footer to prevent it becoming a standalone ChromaDB drawer via paragraph chunking on exports with fewer than 3 exchange pairs (violates verbatim-always principle) - Sanitize speaker user_id/username: strip brackets, newlines, and control characters to prevent chunk-boundary injection via crafted Slack exports - Extract header string to _SLACK_PROVENANCE_FOOTER module constant, consistent with _TOOL_RESULT_* constants pattern; tests import it instead of duplicating the literal Refs: MemPalace#809
a60839f to
2704b15
Compare
bensig
approved these changes
Apr 15, 2026
Collaborator
bensig
left a comment
There was a problem hiding this comment.
Code reviewed — no issues found. CLAUDE.md compliance verified.
igorls
added a commit
that referenced
this pull request
Apr 16, 2026
Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only default, #866 init .gitignore, #864 MCP stdout redirect, #863 precompact hook, #865 searcher empty results, #831 cold-start palace, #862 init help, #815 Slack provenance, #840 save hook auto-mine. Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.
shafdev
pushed a commit
to shafdev/mempalace
that referenced
this pull request
Apr 17, 2026
Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for MemPalace#871 MEMPAL_VERBOSE, MemPalace#811 research() local-only default, MemPalace#866 init .gitignore, MemPalace#864 MCP stdout redirect, MemPalace#863 precompact hook, MemPalace#865 searcher empty results, MemPalace#831 cold-start palace, MemPalace#862 init help, MemPalace#815 Slack provenance, MemPalace#840 save hook auto-mine. Also drops the awkward caveat on MemPalace#846 created_at — it's post-v3.3.0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's own words in all future retrieval (data poisoning via stored memory).
This PR adds two mitigations while preserving compatibility with the existing exchange-pair chunking pipeline:
[source: slack-export | multi-party chat — speaker roles are positional, not verified][U1] Helloinstead of justHelloRole alternation (
user/assistant) is preserved soconvo_miner.py's>marker-based chunking continues to work — but now every message carries its original speaker identity and the transcript is clearly marked as multi-party with unverified roles.Refs: #809 (Finding 6)
What changed
mempalace/normalize.py_try_slack_json()— provenance header prepended to output; each message prefixed with[speaker_id]tests/test_normalize.pytest_slack_json_has_provenance_header— verifies header presence and keywordstest_slack_json_preserves_speaker_id— verifies[U1],[U2]in outputtest_slack_json_attacker_first_message_attributed— verifies attacker's ID is visible even when placed firstTest plan
pytest tests/test_normalize.py -v— 91/91 passedpytest tests/ -v --ignore=tests/benchmarks— 689 passed, 2 failed (pre-existing version mismatch)ruff check— all checks passedruff format --check— all files formatted