Add ai-translate extension#27675
Conversation
|
Congratulations on your new Raycast extension! 🚀 We're currently experiencing a high volume of incoming requests. As a result, the initial review may take up to 10-15 business days. Once the PR is approved and merged, the extension will be available on our Store. |
Greptile SummaryThis PR introduces a new AI Translate extension with four commands: screenshot OCR translation, text extraction from screenshots, clipboard copy of OCR text, and direct text translation — backed by configurable BYOK providers (DeepSeek, MiMo, MiniMax, Kimi, Gemini, OpenAI) with Anthropic-compatible and OpenAI-compatible routing.
Confidence Score: 3/5The extension has a user-visible bug in its core translate command where a slow or absent text selection can silently wipe input the user has already typed. The
Important Files Changed
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
extensions/ai-translate/src/translate.tsx:60-65
**`getSelectedText` result can silently clear user-typed input**
If the user opens the command and immediately starts typing (before the `getSelectedText` promise settles), the catch block calls `setInputText("")` which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.
### Issue 2 of 2
extensions/ai-translate/src/translate.tsx:28-30
**Manually defined command argument interface**
`TranslateArguments` is a hand-written type for the `text` argument. Raycast auto-generates command argument types in `raycast-env.d.ts` at build time, so this manual definition can silently drift out of sync if the argument name or type changes in `package.json`. The same pattern appears in `extract-text-from-screenshot.tsx` with `ExtractArguments` and `ExtractLaunchContext`. Both files should use the generated types instead.
Reviews (6): Last reviewed commit: "Address review feedback" | Re-trigger Greptile |
- Add store screenshot - Fix extracted text result launch
- Add second store screenshot - Keep OCR result in extract view
|
Addressed the Greptile feedback in the latest update. Changes made:
Latest extension commit in this PR: 70d1ae0. |
| try { | ||
| const selectedText = normalizeInputText(await getSelectedText()); | ||
| if (isMounted) setInputText(selectedText); | ||
| } catch { | ||
| if (isMounted) setInputText(""); | ||
| } |
There was a problem hiding this comment.
getSelectedText result can silently clear user-typed input
If the user opens the command and immediately starts typing (before the getSelectedText promise settles), the catch block calls setInputText("") which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/ai-translate/src/translate.tsx
Line: 60-65
Comment:
**`getSelectedText` result can silently clear user-typed input**
If the user opens the command and immediately starts typing (before the `getSelectedText` promise settles), the catch block calls `setInputText("")` which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.
How can I resolve this? If you propose a fix, please make it concise.
Description
Adds AI Translate, a Raycast extension focused on fast screenshot OCR translation.
Version 1.0.0 is intentionally scoped: take a screenshot quickly, extract visible text reliably, and translate it with as little friction as possible. It is not trying to become a full translation suite. The main use case is the common moment where users can see text on screen but cannot select it.
Why not just use Raycast's built-in translation? Raycast's built-in flow is useful when text can be selected or passed as plain text, but it does not cover OCR translation. In many real workflows, text is locked inside app UI, images, PDFs, slides, videos, remote desktops, protected documents, or web pages with broken selection. In those cases, screenshot capture is the most reliable input surface.
Raycast's built-in AI features also do not provide this level of model routing for translation. AI Translate lets users choose the provider, model ID, base URL, and API key themselves, so translation can run through the exact Token Plan, Coding Plan, or provider-specific subscription they already pay for.
AI Translate layers OCR and AI translation on top of that screenshot-first workflow. It prioritizes cost-effective, high-quality providers such as DeepSeek, Xiaomi MiMo, MiniMax, and Kimi through Anthropic-compatible
/v1/messagesendpoints, while still supporting OpenAI / ChatGPT and Gemini. Kimi now defaults tohttps://api.kimi.com/coding/as its Anthropic-compatible coding base URL.The default system prompt is sense-for-sense rather than literal: it asks the model to write as a native speaker of the target language would naturally express the same idea, while preserving the source meaning, tone, facts, and level of formality. For Chinese targets, it explicitly asks for natural Chinese expression instead of English syntax rewritten with Chinese words.
The extension also supports configurable translation prompts. Users can choose a built-in
Prompt Profilesuch as Screenshot OCR, Technical / Developer, Academic Writing, Legal / Policy, Subtitle / Conversation, or Custom Only, then add reusableCustom Prompt Instructionsfor terminology, audience, tone, and formatting preferences. These instructions are included with every translation request while the source text stays in a separateText:block.The extension provides four commands:
Translate Screenshot: captures a screen region, runs OCR, and opens the recognized text in the translation view.Extract Text from Screenshot: captures a screen region, runs OCR, and opens an editable result view with copy, compact-copy, translate, and retake actions.Copy Text from Screenshot: captures a screen region, runs OCR, and copies the recognized text to the clipboard without opening a result view.Translate Selected Text: translates selected text, typed text, or a fallback argument when selection is available.OCR options include macOS Vision, Tesseract, Baidu OCR, and PaddleOCR HTTP services.
Official API documentation links are included in the README for each provider:
Screencast
Store screenshots are included in:
metadata/ai-translate-1.pngmetadata/ai-translate-2.pngmetadata/ai-translate-3.pngChecklist
npm run buildand tested this distribution build in Raycastassetsfolder are used by the extension itselfREADMEare placed outside of themetadatafolder