Handle ImageContent in MCP tool results (pass images to vision-capable LLMs) by faisalishfaq2005 · Pull Request #262 · huggingface/ml-intern

faisalishfaq2005 · 2026-05-16T11:25:04Z

Problem

When an MCP tool returned an ImageContent block, the agent discarded the
actual image data and forwarded only a text placeholder to the LLM:

# agent/core/tools.py (before)
elif isinstance(item, ImageContent):
    parts.append(f"[Image: {item.mimeType}]")  # data thrown away

As a result, MCP tools returning screenshots, charts, diagrams, or other
visual outputs provided no usable visual information to the model — even
when using vision-capable models such as Claude or GPT-4o.

Solution

Renamed convert_mcp_content_to_string to
convert_mcp_content_to_llm_content
and updated its return type from str to str | list[dict].

Behavior

Text-only results still return a plain str
(fully backward-compatible).
Results containing at least one ImageContent now return a list of
OpenAI-style content blocks (text + image_url).

LiteLLM already supports this format and automatically translates it into
provider-native formats (Anthropic image source blocks, etc.).

Image block format

{
    "type": "image_url",
    "image_url": {
        "url": f"data:{mimeType};base64,{data}"
    }
}

ImageContent.data is already base64-encoded per the MCP spec, so no
additional encoding step is required.

Since LiteLLM's Message(content=...) already accepts both str and
list, no changes were required to the core message-construction flow in
agent_loop.py.

Additional fix

agent_loop.py also contained a was_edited note-prepend implemented via
an f-string, which implicitly converted list-based content into a string
representation.

Updated the prepend logic to branch on the output type before modifying the
content.

Files Changed

`agent/core/tools.py`

Renamed conversion function
Updated return type
Reworked conversion logic into a single-pass content block collector
Added support for OpenAI-style image content blocks

`agent/core/agent_loop.py`

Fixed was_edited prepend handling for list-based content

`tests/unit/test_mcp_content_conversion.py`

Added 18 unit tests covering:

empty input
text-only output (str)
single image output (list)
mixed text + image ordering
multiple images
embedded resources
unknown content type fallback
return type contracts

Testing

uv run --extra dev pytest tests/unit/test_mcp_content_conversion.py -v
# 18 passed

Pre-existing test failures (unrelated)

Two tests fail on Windows before and after this change:

test_session_uploader — imports fcntl (Linux/macOS only)
test_prioritize_backlog — hardcodes a Unix path separator

Both fail on an unmodified main branch on Windows. Out of scope for this PR.

Handle ImageContent in convert_mcp_content_to_llm_content

72d7710

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262

Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262
faisalishfaq2005 wants to merge 1 commit into
huggingface:mainfrom
faisalishfaq2005:handle-mcp-image-content

faisalishfaq2005 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faisalishfaq2005 commented May 16, 2026

Problem

Solution

Behavior

Image block format

Additional fix

Files Changed

agent/core/tools.py

agent/core/agent_loop.py

tests/unit/test_mcp_content_conversion.py

Testing

Pre-existing test failures (unrelated)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`agent/core/tools.py`

`agent/core/agent_loop.py`

`tests/unit/test_mcp_content_conversion.py`