Add built-in read_image tool for LLM image access with image resizing#296818
Add built-in read_image tool for LLM image access with image resizing#296818
Conversation
Implements a new built-in chat tool 'vscode_readImage_internal' that reads image files from disk and returns their contents as data parts that LLMs can process. Supports PNG, JPEG, GIF, WEBP, and BMP formats. Co-authored-by: hediet <2931520+hediet@users.noreply.github.com>
|
@copilot check how the screenshot tool does it - is there any kind of scaling down if the image is too big? |
Moved ReadImageTool implementation to browser layer to use resizeImage() from chatImageUtils.ts. Large images are now scaled down (max 2048px, then 768px min dimension) before being sent to the LLM, matching the same algorithm used by the screenshot tool. Architecture: - Tool data/types/helpers remain in common layer - Tool implementation with resizing in browser/tools/readImageTool.ts - Registration via ReadImageToolContribution in browser chat.contribution Co-authored-by: hediet <2931520+hediet@users.noreply.github.com>
The screenshot tool uses |
|
Hey team! 👋 I was looking at this PR and noticed something interesting that I'd love to understand better from an architectural perspective. I see that vscode_readImage_internal is being added as a built-in tool at the platform level ( Extension-level tools ( copilot_readFile - reads text files Platform-level tools (VS Code core: askQuestionsTool - user interaction The pattern I'm seeing is that platform tools focus on infrastructure/orchestration/language services, while content/file operation tools live in the extension. Is there a strategic shift happening to move file content tools to the platform level? Or does readImage have special requirements (like the image resizing for vision models) that make it more appropriate as a platform tool? I'm genuinely curious about the design thinking here, as I'm working on similar functionality and want to make sure I understand the right architectural boundaries! 🙂 Thanks for any insights you can share! |
LLMs currently cannot see images on disk — users must attach them manually. This adds a
vscode_readImage_internalbuilt-in tool that reads image files and returns their contents asIToolResultDataPartfor model consumption. Large images are automatically scaled down using the same OpenAI vision algorithm used by the screenshot tool.Changes
readImageTool.tsincommon/tools/builtinTools/— tool data, parameter types, andgetSupportedImageMimeType()helper. Supports PNG, JPEG, GIF, WEBP, BMP.readImageTool.tsinbrowser/tools/— tool implementation that reads images viaIFileService, validates format, resizes usingresizeImage()fromchatImageUtils.ts(max 2048px, then 768px min dimension), and returns binary data with MIME type.ReadImageToolContributioninbrowser/chat.contribution.tsgetSupportedImageMimeType— MIME type detection for supported/unsupported extensions, case insensitivityArchitecture
The tool is split across two layers:
common/tools/builtinTools/readImageTool.ts): Tool data, params interface, and MIME type helper (no browser dependencies)browser/tools/readImageTool.ts): Tool implementation with image resizing (requires DOM APIs viaresizeImage())Tool schema
{ "filePaths": ["file:///path/to/screenshot.png", "file:///path/to/diagram.jpg"] }Returns
IToolResultDataPartper image (or text error per failure), plustoolResultDetailswith successfully read URIs. Images are resized before being returned to avoid wasting tokens on oversized images.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.