Skip to content

Add built-in read_image tool for LLM image access with image resizing#296818

Closed
Copilot wants to merge 3 commits intomainfrom
copilot/add-read-image-tool
Closed

Add built-in read_image tool for LLM image access with image resizing#296818
Copilot wants to merge 3 commits intomainfrom
copilot/add-read-image-tool

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 22, 2026

LLMs currently cannot see images on disk — users must attach them manually. This adds a vscode_readImage_internal built-in tool that reads image files and returns their contents as IToolResultDataPart for model consumption. Large images are automatically scaled down using the same OpenAI vision algorithm used by the screenshot tool.

Changes

  • New readImageTool.ts in common/tools/builtinTools/ — tool data, parameter types, and getSupportedImageMimeType() helper. Supports PNG, JPEG, GIF, WEBP, BMP.
  • New readImageTool.ts in browser/tools/ — tool implementation that reads images via IFileService, validates format, resizes using resizeImage() from chatImageUtils.ts (max 2048px, then 768px min dimension), and returns binary data with MIME type.
  • Registration via ReadImageToolContribution in browser/chat.contribution.ts
  • Tests covering getSupportedImageMimeType — MIME type detection for supported/unsupported extensions, case insensitivity

Architecture

The tool is split across two layers:

  • Common layer (common/tools/builtinTools/readImageTool.ts): Tool data, params interface, and MIME type helper (no browser dependencies)
  • Browser layer (browser/tools/readImageTool.ts): Tool implementation with image resizing (requires DOM APIs via resizeImage())

Tool schema

{
  "filePaths": ["file:///path/to/screenshot.png", "file:///path/to/diagram.jpg"]
}

Returns IToolResultDataPart per image (or text error per failure), plus toolResultDetails with successfully read URIs. Images are resized before being returned to avoid wasting tokens on oversized images.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add tool "read_image" to allow LLMs to see images on disk</issue_title>
<issue_description>Currently images have to be added manually.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Implements a new built-in chat tool 'vscode_readImage_internal' that reads
image files from disk and returns their contents as data parts that LLMs
can process. Supports PNG, JPEG, GIF, WEBP, and BMP formats.

Co-authored-by: hediet <2931520+hediet@users.noreply.github.com>
Copilot AI changed the title [WIP] Add tool 'read_image' for image access by LLMs Add built-in read_image tool for LLM image access Feb 22, 2026
@hediet
Copy link
Copy Markdown
Member

hediet commented Feb 23, 2026

@copilot check how the screenshot tool does it - is there any kind of scaling down if the image is too big?

Moved ReadImageTool implementation to browser layer to use resizeImage()
from chatImageUtils.ts. Large images are now scaled down (max 2048px,
then 768px min dimension) before being sent to the LLM, matching the
same algorithm used by the screenshot tool.

Architecture:
- Tool data/types/helpers remain in common layer
- Tool implementation with resizing in browser/tools/readImageTool.ts
- Registration via ReadImageToolContribution in browser chat.contribution

Co-authored-by: hediet <2931520+hediet@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 23, 2026

@copilot check how the screenshot tool does it - is there any kind of scaling down if the image is too big?

The screenshot tool uses resizeImage() from chatImageUtils.ts which implements OpenAI's vision algorithm — scales images with max dimension > 2048px down to 2048px, then scales to 768px on the shortest side. I moved the ReadImageTool implementation to the browser layer and added the same resizeImage() call before returning image data. See 714bdb6.

Copilot AI changed the title Add built-in read_image tool for LLM image access Add built-in read_image tool for LLM image access with image resizing Feb 23, 2026
@shams-developer
Copy link
Copy Markdown

Hey team! 👋

I was looking at this PR and noticed something interesting that I'd love to understand better from an architectural perspective.

I see that vscode_readImage_internal is being added as a built-in tool at the platform level (src/vs/workbench/contrib/chat/common/tools/builtinTools/). However, when I look at the existing tool landscape, I notice that similar file operation tools live in the Copilot Chat extension rather than in VS Code core:

Extension-level tools (vscode-copilot-chat repo):

copilot_readFile - reads text files
copilot_findTextInFiles - grep search
copilot_createFile, copilot_replaceString - file modifications
Many others

Platform-level tools (VS Code core:

askQuestionsTool - user interaction
runSubagentTool - orchestration
renameTool, usagesTool - language service features
Confirmation/permission tools etc

The pattern I'm seeing is that platform tools focus on infrastructure/orchestration/language services, while content/file operation tools live in the extension. Is there a strategic shift happening to move file content tools to the platform level? Or does readImage have special requirements (like the image resizing for vision models) that make it more appropriate as a platform tool?

I'm genuinely curious about the design thinking here, as I'm working on similar functionality and want to make sure I understand the right architectural boundaries! 🙂

Thanks for any insights you can share!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tool "read_image" to allow LLMs to see images on disk

3 participants