Skip to content

Validate MIME type / extension on material upload #2295

@yontank

Description

@yontank

Problem

backend/app/api/routes/material.py::create_material accepts any UploadFile and writes it to storage without validating the MIME type or extension. A user can upload arbitrary binaries (executables, scripts, images, etc.) and have them stored as "course material".

Relevant snippet (backend/app/api/routes/material.py:83-113):

@router.post("/", response_model=FilePublic)
def create_material(
    *,
    ...
    file_type: FileTypes = Form(...),
    file: UploadFile = FileParam(...),
) -> Any:
    if file_type not in MATERIAL_FILE_TYPES:
        raise HTTPException(422, "file_type must be SUMMARY or BOOK")

    _get_course_for_user(session, course_id, current_user)

    status, stored_path, basename = storage.save(file, owner_id=current_user.id)
    ...

There is no check that the uploaded file is actually a PDF / supported document type for SUMMARY / BOOK.

Why it matters

  • Downstream RAG/ingest pipeline will likely choke on unexpected formats.
  • Storage gets polluted with unintended file types.
  • Mild abuse vector (using the bucket as generic file hosting).

Suggested approach

  • Decide the allowed set per file_type (e.g. PDF only for now).
  • Validate using python-magic / filetype (sniff the bytes) rather than trusting the client-supplied Content-Type or extension.
  • Return 415 Unsupported Media Type on rejection.
  • Add tests in backend/tests/api/routes/test_materials.py for both accepted and rejected types.

Not urgent — tracking for later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions