Skip to content

Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262

Open
faisalishfaq2005 wants to merge 1 commit into
huggingface:mainfrom
faisalishfaq2005:handle-mcp-image-content
Open

Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262
faisalishfaq2005 wants to merge 1 commit into
huggingface:mainfrom
faisalishfaq2005:handle-mcp-image-content

Conversation

@faisalishfaq2005
Copy link
Copy Markdown

Problem

When an MCP tool returned an ImageContent block, the agent discarded the
actual image data and forwarded only a text placeholder to the LLM:

# agent/core/tools.py (before)
elif isinstance(item, ImageContent):
    parts.append(f"[Image: {item.mimeType}]")  # data thrown away

As a result, MCP tools returning screenshots, charts, diagrams, or other
visual outputs provided no usable visual information to the model — even
when using vision-capable models such as Claude or GPT-4o.

Solution

Renamed convert_mcp_content_to_string to
convert_mcp_content_to_llm_content
and updated its return type from str to str | list[dict].

Behavior

  • Text-only results still return a plain str
    (fully backward-compatible).
  • Results containing at least one ImageContent now return a list of
    OpenAI-style content blocks (text + image_url).

LiteLLM already supports this format and automatically translates it into
provider-native formats (Anthropic image source blocks, etc.).

Image block format

{
    "type": "image_url",
    "image_url": {
        "url": f"data:{mimeType};base64,{data}"
    }
}

ImageContent.data is already base64-encoded per the MCP spec, so no
additional encoding step is required.

Since LiteLLM's Message(content=...) already accepts both str and
list, no changes were required to the core message-construction flow in
agent_loop.py.

Additional fix

agent_loop.py also contained a was_edited note-prepend implemented via
an f-string, which implicitly converted list-based content into a string
representation.

Updated the prepend logic to branch on the output type before modifying the
content.

Files Changed

agent/core/tools.py

  • Renamed conversion function
  • Updated return type
  • Reworked conversion logic into a single-pass content block collector
  • Added support for OpenAI-style image content blocks

agent/core/agent_loop.py

  • Fixed was_edited prepend handling for list-based content

tests/unit/test_mcp_content_conversion.py

Added 18 unit tests covering:

  • empty input
  • text-only output (str)
  • single image output (list)
  • mixed text + image ordering
  • multiple images
  • embedded resources
  • unknown content type fallback
  • return type contracts

Testing

uv run --extra dev pytest tests/unit/test_mcp_content_conversion.py -v
# 18 passed

Pre-existing test failures (unrelated)

Two tests fail on Windows before and after this change:

  • test_session_uploader — imports fcntl (Linux/macOS only)
  • test_prioritize_backlog — hardcodes a Unix path separator

Both fail on an unmodified main branch on Windows. Out of scope for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant