Skip to content

Add ai-translate extension#27675

Open
xwzhangSZU wants to merge 16 commits intoraycast:mainfrom
xwzhangSZU:ext/ai-translate
Open

Add ai-translate extension#27675
xwzhangSZU wants to merge 16 commits intoraycast:mainfrom
xwzhangSZU:ext/ai-translate

Conversation

@xwzhangSZU
Copy link
Copy Markdown
Contributor

@xwzhangSZU xwzhangSZU commented May 5, 2026

Description

Adds AI Translate, a Raycast extension focused on fast screenshot OCR translation.

Version 1.0.0 is intentionally scoped: take a screenshot quickly, extract visible text reliably, and translate it with as little friction as possible. It is not trying to become a full translation suite. The main use case is the common moment where users can see text on screen but cannot select it.

Why not just use Raycast's built-in translation? Raycast's built-in flow is useful when text can be selected or passed as plain text, but it does not cover OCR translation. In many real workflows, text is locked inside app UI, images, PDFs, slides, videos, remote desktops, protected documents, or web pages with broken selection. In those cases, screenshot capture is the most reliable input surface.

Raycast's built-in AI features also do not provide this level of model routing for translation. AI Translate lets users choose the provider, model ID, base URL, and API key themselves, so translation can run through the exact Token Plan, Coding Plan, or provider-specific subscription they already pay for.

AI Translate layers OCR and AI translation on top of that screenshot-first workflow. It prioritizes cost-effective, high-quality providers such as DeepSeek, Xiaomi MiMo, MiniMax, and Kimi through Anthropic-compatible /v1/messages endpoints, while still supporting OpenAI / ChatGPT and Gemini. Kimi now defaults to https://api.kimi.com/coding/ as its Anthropic-compatible coding base URL.

The default system prompt is sense-for-sense rather than literal: it asks the model to write as a native speaker of the target language would naturally express the same idea, while preserving the source meaning, tone, facts, and level of formality. For Chinese targets, it explicitly asks for natural Chinese expression instead of English syntax rewritten with Chinese words.

The extension also supports configurable translation prompts. Users can choose a built-in Prompt Profile such as Screenshot OCR, Technical / Developer, Academic Writing, Legal / Policy, Subtitle / Conversation, or Custom Only, then add reusable Custom Prompt Instructions for terminology, audience, tone, and formatting preferences. These instructions are included with every translation request while the source text stays in a separate Text: block.

The extension provides four commands:

  • Translate Screenshot: captures a screen region, runs OCR, and opens the recognized text in the translation view.
  • Extract Text from Screenshot: captures a screen region, runs OCR, and opens an editable result view with copy, compact-copy, translate, and retake actions.
  • Copy Text from Screenshot: captures a screen region, runs OCR, and copies the recognized text to the clipboard without opening a result view.
  • Translate Selected Text: translates selected text, typed text, or a fallback argument when selection is available.

OCR options include macOS Vision, Tesseract, Baidu OCR, and PaddleOCR HTTP services.

Official API documentation links are included in the README for each provider:

Screencast

Store screenshots are included in:

  • metadata/ai-translate-1.png
  • metadata/ai-translate-2.png
  • metadata/ai-translate-3.png

Checklist

@raycastbot raycastbot added the new extension Label for PRs with new extensions label May 5, 2026
@raycastbot
Copy link
Copy Markdown
Collaborator

Congratulations on your new Raycast extension! 🚀

We're currently experiencing a high volume of incoming requests. As a result, the initial review may take up to 10-15 business days.

Once the PR is approved and merged, the extension will be available on our Store.

@xwzhangSZU xwzhangSZU marked this pull request as ready for review May 5, 2026 23:37
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR introduces a new AI Translate extension with four commands: screenshot OCR translation, text extraction from screenshots, clipboard copy of OCR text, and direct text translation — backed by configurable BYOK providers (DeepSeek, MiMo, MiniMax, Kimi, Gemini, OpenAI) with Anthropic-compatible and OpenAI-compatible routing.

  • The translate command's getSelectedText call can silently clear text the user has already typed: if nothing is selected, the rejection handler calls setInputText(\"\"), overwriting whatever input was typed during the async wait.
  • Multiple files manually define command argument interfaces (TranslateArguments, ExtractArguments, ExtractLaunchContext) that Raycast auto-generates in raycast-env.d.ts, continuing the same drift risk as the manually defined ExtensionPreferences flagged in a previous review thread.

Confidence Score: 3/5

The extension has a user-visible bug in its core translate command where a slow or absent text selection can silently wipe input the user has already typed.

The getSelectedText catch block unconditionally calls setInputText(""), meaning any user who opens the Translate command and starts typing before the selection promise settles will have their input cleared with no error message or recovery path. This is the extension's primary command and the bug is on its main input path.

src/translate.tsx — the getSelectedText setup effect needs a guard to avoid overwriting user input on rejection.

Important Files Changed

Filename Overview
extensions/ai-translate/src/translate.tsx Main translation view; contains a race condition where getSelectedText rejection can clear user-typed input, and a manually defined argument interface that should be auto-generated.
extensions/ai-translate/src/providers.ts Provider routing and HTTP logic; uses safeParseJson to guard against non-JSON error bodies, handles Gemini, OpenAI-compatible, and Anthropic-compatible paths correctly.
extensions/ai-translate/src/ocr-engines.ts OCR engine dispatch with Baidu token caching, Tesseract, and PaddleOCR support; safeParseJson + isRawResponse guard catches non-JSON responses correctly.
extensions/ai-translate/src/types.ts Manually defines ExtensionPreferences (flagged in earlier review thread) instead of relying on the auto-generated raycast-env.d.ts type.
extensions/ai-translate/src/languages.ts Auto-detect language logic correctly excludes Japanese (Hiragana/Katakana check) and Korean (Hangul check) before falling back to the CJK ideograph range.
extensions/ai-translate/src/preferences.ts Preference reading and provider config resolution; provider ordering and fallback logic is correct but still uses the manually defined ExtensionPreferences generic.
extensions/ai-translate/src/extract-text-from-screenshot.tsx OCR extraction view; manually defines ExtractArguments and ExtractLaunchContext interfaces that should be auto-generated by Raycast tooling.
extensions/ai-translate/package.json Extension manifest includes $schema, metadata screenshots, and macOS-only platform; category list was noted as broader than needed in a prior review thread.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
extensions/ai-translate/src/translate.tsx:60-65
**`getSelectedText` result can silently clear user-typed input**

If the user opens the command and immediately starts typing (before the `getSelectedText` promise settles), the catch block calls `setInputText("")` which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.

### Issue 2 of 2
extensions/ai-translate/src/translate.tsx:28-30
**Manually defined command argument interface**

`TranslateArguments` is a hand-written type for the `text` argument. Raycast auto-generates command argument types in `raycast-env.d.ts` at build time, so this manual definition can silently drift out of sync if the argument name or type changes in `package.json`. The same pattern appears in `extract-text-from-screenshot.tsx` with `ExtractArguments` and `ExtractLaunchContext`. Both files should use the generated types instead.

Reviews (6): Last reviewed commit: "Address review feedback" | Re-trigger Greptile

Comment thread extensions/ai-translate/src/types.ts
Comment thread extensions/ai-translate/src/ocr-engines.ts Outdated
Comment thread extensions/ai-translate/package.json Outdated
Comment thread extensions/ai-translate/src/providers.ts Outdated
xwzhangSZU added 5 commits May 6, 2026 08:09
- Add store screenshot
- Fix extracted text result launch
- Add second store screenshot
- Keep OCR result in extract view
Comment thread extensions/ai-translate/src/ocr-engines.ts
Comment thread extensions/ai-translate/src/languages.ts Outdated
@xwzhangSZU
Copy link
Copy Markdown
Contributor Author

Addressed the Greptile feedback in the latest update.

Changes made:

  • Fixed Auto language detection so Japanese kana and Korean Hangul are excluded before treating CJK ideographs as Chinese.
  • Added safe JSON parsing for OCR API responses so non-JSON error bodies surface readable HTTP / invalid JSON errors instead of raw SyntaxError crashes.
  • Removed unreachable Kimi / MiMo provider workarounds from the OpenAI-compatible path now that those providers use Anthropic-compatible requests.
  • Removed the redundant Raycast List throttle and kept the existing internal debounce to reduce translation startup latency.

Latest extension commit in this PR: 70d1ae0.

Comment on lines +60 to +65
try {
const selectedText = normalizeInputText(await getSelectedText());
if (isMounted) setInputText(selectedText);
} catch {
if (isMounted) setInputText("");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getSelectedText result can silently clear user-typed input

If the user opens the command and immediately starts typing (before the getSelectedText promise settles), the catch block calls setInputText("") which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/ai-translate/src/translate.tsx
Line: 60-65

Comment:
**`getSelectedText` result can silently clear user-typed input**

If the user opens the command and immediately starts typing (before the `getSelectedText` promise settles), the catch block calls `setInputText("")` which overwrites whatever they have typed. Even a ~100 ms resolution time — normal when nothing is selected — is enough to observe this: the user types a few characters, the catch fires, and the search bar is cleared. This is a silent destructive state mutation with no feedback.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new extension Label for PRs with new extensions platform: macOS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants