Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262
Open
faisalishfaq2005 wants to merge 1 commit into
Open
Handle ImageContent in MCP tool results (pass images to vision-capable LLMs)#262faisalishfaq2005 wants to merge 1 commit into
faisalishfaq2005 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When an MCP tool returned an
ImageContentblock, the agent discarded theactual image data and forwarded only a text placeholder to the LLM:
As a result, MCP tools returning screenshots, charts, diagrams, or other
visual outputs provided no usable visual information to the model — even
when using vision-capable models such as Claude or GPT-4o.
Solution
Renamed
convert_mcp_content_to_stringtoconvert_mcp_content_to_llm_contentand updated its return type from
strtostr | list[dict].Behavior
str(fully backward-compatible).
ImageContentnow return a list ofOpenAI-style content blocks (
text+image_url).LiteLLM already supports this format and automatically translates it into
provider-native formats (Anthropic image source blocks, etc.).
Image block format
{ "type": "image_url", "image_url": { "url": f"data:{mimeType};base64,{data}" } }ImageContent.datais already base64-encoded per the MCP spec, so noadditional encoding step is required.
Since LiteLLM's
Message(content=...)already accepts bothstrandlist, no changes were required to the core message-construction flow inagent_loop.py.Additional fix
agent_loop.pyalso contained awas_editednote-prepend implemented viaan f-string, which implicitly converted list-based content into a string
representation.
Updated the prepend logic to branch on the output type before modifying the
content.
Files Changed
agent/core/tools.pyagent/core/agent_loop.pywas_editedprepend handling for list-based contenttests/unit/test_mcp_content_conversion.pyAdded 18 unit tests covering:
str)list)Testing
uv run --extra dev pytest tests/unit/test_mcp_content_conversion.py -v # 18 passedPre-existing test failures (unrelated)
Two tests fail on Windows before and after this change:
test_session_uploader— importsfcntl(Linux/macOS only)test_prioritize_backlog— hardcodes a Unix path separatorBoth fail on an unmodified
mainbranch on Windows. Out of scope for this PR.