Skip to content

feat: outbound image/file attachments from agent → Discord#300

Open
DrVictorChen wants to merge 2 commits intoopenabdev:mainfrom
DrVictorChen:feat/outbound-attachments-clean
Open

feat: outbound image/file attachments from agent → Discord#300
DrVictorChen wants to merge 2 commits intoopenabdev:mainfrom
DrVictorChen:feat/outbound-attachments-clean

Conversation

@DrVictorChen
Copy link
Copy Markdown

@DrVictorChen DrVictorChen commented Apr 13, 2026

What problem does this solve?

Agents running through OpenAB can receive images from users (PR #158), but have no native path to send images/files back. The only workaround is agents calling the Discord REST API directly via curl with the bot token — requiring the agent to know the token and channel ID, sending messages outside OpenAB's thread context, and breaking portability across agents.

Closes #298

At a Glance

┌──────────────────────┐
│   CLI Agent Process   │
│  (Claude Code / Codex │
│   / Cursor / Copilot) │
└──────────┬───────────┘
           │ ACP response text:
           │ "Here's the screenshot
           │  ![img](/tmp/screen.png)"
           ▼
┌──────────────────────────────────┐
│            OpenAB                │
│                                  │
│  1. extract_outbound_attachments │
│     ├─ regex: ![...](/path)      │
│     ├─ allowlist: /tmp/, /var/   │
│     ├─ size ≤ 25MB               │
│     └─ strip marker from text    │
│                                  │
│  2. Text reply (marker removed)  │
│     → edit_message(text)         │
│                                  │
│  3. File attachment              │
│     → CreateAttachment::path()   │
│     → channel.send_message(file) │
└──────────────┬───────────────────┘
               ▼
┌──────────────────────┐
│   Discord Channel     │
│                       │
│   📝 cleaned text     │
│   📎 native attachment│
└───────────────────────┘

Prior Art & Industry Research

OpenClaw:

Uses a MEDIA: /path/to/file text directive pattern. The agent writes MEDIA: <path-or-url> inline in its response text. The gateway's splitMediaFromOutput() (src/media/parse.ts) extracts these via regex, then routes through a 4-stage pipeline: parse → load (with SSRF protection, HEIC conversion, size capping) → normalize → channel delivery. Each channel plugin implements sendPhoto/sendDocument/etc.

Key lessons from OpenClaw:

  • Text-directive approach is simple but causes false positivesMEDIA: appearing in tool results or docs triggers unintended file loading (#18780, #16935)
  • A security advisory (GHSA-r8g4-86fx-92mq) was issued because MEDIA: could reference arbitrary local files → fixed with mediaLocalRoots directory allowlist
  • Relative paths break due to working directory changes before async delivery (#8759)
  • Multiple images sent as separate messages, not albums (#14027)

Hermes Agent:

Uses a nearly identical MEDIA:/path/to/file tag convention. BasePlatformAdapter.extract_media() (gateway/platforms/base.py) parses output, returns (media_files_list, cleaned_text). Tags are stripped from displayed text. File routing is extension-based: .png/.jpgsend_image_file(), .ogg/.mp3send_voice(), .mp4send_video(), everything else → send_document(). Each platform adapter overrides the methods it supports, with graceful fallback to text.

Key lessons from Hermes:

  • Fallback chain — unsupported media types degrade to text (send URL/path), never crash
  • Post-stream extraction — MEDIA: tags stripped from streamed text immediately, but file delivery happens after stream completes
  • Agents lack explicit send_image tools — they embed MEDIA: in text and rely on post-processing (#4701)

Other references — the agent runtimes we tested with:

OpenAB bridges CLI agents to Discord via ACP (Agent Client Protocol). The four agents we tested are:

Agent Runtime Vendor How it produces media references
Claude Code (@agentclientprotocol/claude-agent-acp) Anthropic Naturally produces markdown ![alt](path) when referencing files. Can generate images via tools and reference them in output.
Codex CLI OpenAI Produces markdown image syntax. File I/O via shell commands; references output files in markdown.
Cursor (CLI agent mode) Anysphere Produces markdown when describing files. Has filesystem access and can create/read images.
GitHub Copilot (CLI agent) Microsoft Produces markdown output. Can invoke shell commands to create files and reference them.

All four agents naturally output ![description](path) when referencing local files — this is standard LLM markdown behavior, which is why we chose markdown syntax over a custom MEDIA: directive.

Comparison table:

Aspect OpenClaw Hermes Agent OpenAB (this PR)
Marker syntax MEDIA: /path MEDIA:/path ![alt](/path) (markdown)
Extraction Regex on text output Regex on text output Regex on text output
Path security mediaLocalRoots allowlist (post-CVE) No explicit allowlist /tmp/, /var/folders/ allowlist
Size limit 50MB images, 16MB audio/video Platform-dependent 25MB (Discord limit)
File type routing Extension → channel-specific method Extension → adapter method Single path (CreateAttachment)
Multi-file Separate messages (no batching) Per-file delivery Separate messages
False positive risk High (any MEDIA: in text) Moderate Low (markdown ![]() is structural)
Platforms 7+ (Telegram, Discord, Slack…) 17+ Discord only
Tested agents n/a (built-in agent) n/a (built-in agent) Claude Code, Codex, Cursor, Copilot

Proposed Solution

Outbound Attachment Detection & Upload

  1. Agent response containing ![alt](/path/to/file) is intercepted by extract_outbound_attachments() in discord.rs
  2. Security validation: path must start with /tmp/ or /var/folders/, file must exist, be a regular file, and ≤ 25MB
  3. Valid files uploaded via serenity CreateAttachment::path() + CreateMessage::new().add_file()
  4. Markers stripped from the text reply; leftover blank lines cleaned up
  5. Attachments sent as follow-up messages to the channel

macOS Build Fix (Makefile)

cargo build on macOS 26.3+ produces adhoc-signed binaries that hang at _dyld_start due to AMFI enforcement. make build / make install auto-run codesign --force --sign - on Darwin. No-op on Linux.

Why this approach?

We chose markdown image syntax (![alt](/path)) over MEDIA: for three reasons:

  1. Lower false positive risk — Both OpenClaw and Hermes suffer from MEDIA: appearing in tool outputs, documentation, or release notes and triggering unintended file loads. Markdown ![]() syntax is structurally distinct and rarely appears in agent prose outside of intentional use.

  2. Natural for all 4 tested agent runtimes — Claude Code, Codex, Cursor, and Copilot all produce markdown ![description](path) when referencing images. No special prompting needed — the agents already know this syntax.

  3. Minimal code change — OpenAB is Discord-only, so we don't need the multi-platform routing complexity of OpenClaw (4-stage pipeline) or Hermes (per-platform adapter chain). A single regex + CreateAttachment covers the use case in ~70 lines of Rust.

The path allowlist approach was directly informed by OpenClaw's CVE (GHSA-r8g4-86fx-92mq) — we enforce it from day one rather than retrofitting after a security incident.

Alternatives Considered

Alternative Why not chosen
MEDIA: directive (OpenClaw/Hermes style) Higher false positive risk; not native markdown — agents would need explicit prompting to use it
Dedicated XML tag <attachment path="..." /> Unambiguous but requires agents to learn a custom syntax; LLMs don't naturally produce this
ACP content block extension Future-proof but ACP spec doesn't yet support file/image output blocks; would require upstream spec changes
Agent calls Discord API directly (current workaround) Works but requires bot token + channel ID in agent context; messages land outside OpenAB's session/thread management
serenity EditMessage with attachment serenity's EditMessage doesn't support adding attachments to existing messages; must use CreateMessage for follow-up

Validation

  • cargo check passes
  • cargo test — 39/39 pass, including 5 new outbound attachment tests:
    • outbound_no_markers_passthrough — text without markers unchanged
    • outbound_extracts_tmp_file/tmp/ file correctly extracted
    • outbound_blocks_non_allowlisted_path/etc/passwd blocked
    • outbound_ignores_nonexistent_file — missing file kept as text
    • outbound_handles_multiple_attachments — multiple files in one response
  • Live tested with 4 different agent runtimes on macOS via launchd multi-agent deployment:
    • Claude Code (Anthropic, via @agentclientprotocol/claude-agent-acp)
    • Codex CLI (OpenAI)
    • Cursor (Anysphere, IDE agent mode)
    • GitHub Copilot (Microsoft, CLI agent)
  • Verified in Discord: image appears as native attachment, text marker stripped, non-allowlisted paths blocked

Observation from live testing:

Some agents (notably Codex and Copilot) attempted to bypass the native mechanism by calling the Discord REST API directly via curl with the bot token. This confirms the original problem statement — agents resort to hacky workarounds when no native path exists. Notably, even when an agent used the curl workaround, OpenAB's outbound handler still triggered simultaneously on the ![](/path) markers in the same response, successfully uploading the file through the native path. After the agents discovered the native mechanism (by reading discord.rs), they switched to using ![alt](/path) exclusively.

Log output confirming successful upload:

INFO openab::discord: outbound attachment found path=/tmp/test_outbound.png size=588
INFO openab::discord: outbound attachment sent path=/tmp/test_outbound.png
WARN openab::discord: outbound attachment blocked: path not in allowlist path=/path/to/file

🤖 Generated with Claude Code

DrVictorChen and others added 2 commits April 14, 2026 01:06
Add extract_outbound_attachments() to detect ![alt](/path/to/file) markers
in agent responses and upload matching files via CreateAttachment.

Security: path allowlist (/tmp/, /var/folders/), 25MB size cap.
Includes unit tests for extraction, blocking, and edge cases.

Ref: openabdev#298

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
macOS 26.3+ AMFI enforcement causes cargo-built adhoc-signed binaries
to hang at _dyld_start when launched outside the original build session.
`make build` and `make install` auto-run `codesign --force --sign -`
on Darwin to prevent this. No-op on Linux.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chaodu-agent
Copy link
Copy Markdown
Collaborator

Thank you for this well-researched PR — the prior art analysis and cross-agent testing are especially thorough.

After reviewing the implementation, we've decided not to move forward with this feature at this stage. Opening a direct path from the local filesystem to a public Discord channel introduces security surface area (path traversal, symlink following, unintended data exfiltration) that we're not comfortable shipping right now.

We appreciate the effort and the detailed write-up. If there's strong community interest, we'd love to revisit this in the future — upvotes on the linked issue (#298) are welcome and will help us prioritize.

@chaodu-agent chaodu-agent added the p2 Medium — planned work label Apr 14, 2026
Copy link
Copy Markdown
Contributor

@masami-agent masami-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with chaodu-agent's assessment — the security surface area is the right concern to prioritize here.

That said, this is a real user need (agents have no native path to send files back), and the research in this PR is valuable. Here's a suggested path forward:

Security hardening needed before this can ship

  1. Symlink resolutionstd::fs::metadata() follows symlinks, so ln -s /etc/passwd /tmp/innocent.txt bypasses the allowlist. Fix: use std::fs::canonicalize() before checking the prefix:

    let canonical = std::fs::canonicalize(&path).ok()?;
    let allowed = OUTBOUND_ALLOWED_PREFIXES
        .iter()
        .any(|prefix| canonical.starts_with(prefix));
  2. Path traversal![img](/tmp/../etc/passwd) could bypass prefix checks. canonicalize() also resolves .. components, so fix #1 covers this too.

  3. Make allowlist configurable — Hardcoded /tmp/ and /var/folders/ works for local dev but not for containerized deployments where agents write to /home/agent/output/ or similar. Add outbound.allowed_dirs to config.toml.

  4. Opt-in, not opt-on — This feature should be disabled by default and require explicit outbound.enabled = true in config. Operators should consciously decide to allow filesystem → Discord uploads.

  5. Rate limiting — Consider a per-message or per-minute cap on outbound attachments to prevent an agent from flooding a channel.

Suggested next steps

  • Open a follow-up issue capturing these security requirements
  • The core implementation (regex extraction, CreateAttachment, marker stripping, tests) is solid and can be reused once the security layer is in place
  • The Makefile change (macOS codesign fix) is unrelated and should be a separate PR — it's useful on its own

@DrVictorChen — thank you for the thorough research and cross-agent testing. The prior art analysis (OpenClaw CVE, Hermes fallback chain) directly informed the security concerns above. Would you be interested in opening a follow-up issue and iterating on the security hardening?

@masami-agent
Copy link
Copy Markdown
Contributor

Hi @DrVictorChen,

We've opened a follow-up issue #355 that captures the security requirements for this feature. The core implementation in this PR is solid — it just needs the security hardening layer before it can ship.

When you're ready, you can either:

  1. Update this PR to address the 5 security items listed in feat: outbound file attachments from agent → Discord (with security hardening) #355
  2. Or close this and open a fresh PR against feat: outbound file attachments from agent → Discord (with security hardening) #355

The key changes needed:

  • std::fs::canonicalize() before prefix check (symlink + path traversal)
  • Configurable outbound.allowed_dirs in config.toml
  • Opt-in (outbound.enabled = false by default)
  • Rate limiting on outbound attachments
  • Makefile change should be a separate PR

Looking forward to the next iteration! Let us know if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature p2 Medium — planned work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: outbound image/file attachments from agent → Discord

5 participants