Skip to content

feat: allow image + PDF document uploads through validator#199

Open
americodias wants to merge 1 commit intoRichardAtCT:mainfrom
americodias:pr/validator-image-pdf-allowlist
Open

feat: allow image + PDF document uploads through validator#199
americodias wants to merge 1 commit intoRichardAtCT:mainfrom
americodias:pr/validator-image-pdf-allowlist

Conversation

@americodias
Copy link
Copy Markdown

Summary

SecurityValidator.ALLOWED_EXTENSIONS currently permits only source-code file extensions (.py, .js, .md, etc.), so any photo or PDF sent as a Telegram document (rather than via the native photo handler) is rejected at validation time — before FileHandler can inspect it. This PR adds the common attachment formats so document uploads work end-to-end.

Why

Telegram delivers media in two ways:

  1. Native photo — passes through agentic_photoimage_handler.py → SDK content blocks (multimodal). Already works.
  2. As a document — passes through agentic_documentvalidate_filenameFileHandler.handle_document_upload. Currently fails on the validator step for binary formats because .pdf, .png, .jpg, etc. aren't in the allowlist.

Path 2 hits often: users sending PDFs, image attachments from email/web, screenshots saved as files instead of pasted, scanned documents. Today they all bounce with File type not allowed: .pdf.

This is also a prerequisite for any feature that expects to consume PDFs/images via the document pathway (e.g. PR #193's unified FileHandler branch).

What

Adds image + PDF formats to ALLOWED_EXTENSIONS:

# Image formats (document uploads; native photos go through image_handler.py)
".png", ".jpg", ".jpeg", ".gif", ".webp",
".heic", ".heif", ".bmp", ".tiff", ".tif",
# Document formats
".pdf",

Compatibility

  • Pure addition. Existing allowlist entries unchanged.
  • DANGEROUS_FILE_PATTERNS (.exe, .key, .pem, .dll, .so, .dylib, etc.) remains the deny-list and is checked before the allowlist, so dangerous extensions stay blocked.
  • 10MB upload size cap (max_size in agentic_document) still applies.

Test plan

  • Send a .pdf document — bot accepts (passes validator), routes to FileHandler
  • Send a .png as a document — bot accepts
  • Send a .exe — bot still rejects (dangerous pattern takes precedence)
  • Send a .key — bot still rejects
  • Send a .heic (iPhone photos sent as documents) — bot accepts

Notes

This is the minimal fix to unblock binary document uploads. What FileHandler actually does with a binary file is a separate concern — that's where #193 picks up. This PR just makes the validator stop being the wall.

Upstream's SecurityValidator.ALLOWED_EXTENSIONS only permits source-code
file extensions, so any photo or PDF sent as a Telegram document
(rather than via the native photo handler) is rejected at validation
time — before the bot can even archive or inspect it.

Adds the common attachment formats: .png .jpg .jpeg .gif .webp .heic
.heif .bmp .tiff .tif .pdf. Photos sent as Telegram photos still flow
through the dedicated image_handler.py path, which uses native
multimodal SDK content blocks; this allowlist covers the case where
the user attaches an image as a generic document.

Dangerous patterns (.exe / .key / .pem / etc.) remain blocked via
DANGEROUS_FILE_PATTERNS — they take precedence over the allowlist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant