Skip to content

Commit 04b4447

Browse files
authored
Danny/kernel 742 create yutori n1 computer use cli templates (ts/python) (#89)
## Add Yutori n1 Computer Use CLI Templates This PR adds new CLI templates for [Yutori's n1 computer use model](https://docs.yutori.com/reference/n1), enabling users to quickly scaffold browser automation projects using Kernel's infrastructure. ### New Templates - **TypeScript**: `kernel create --template ts-yutori-cua` - **Python**: `kernel create --template python-yutori-cua` ### Features Both templates include: - **Agentic sampling loop** with n1's OpenAI-compatible API - **Computer tool** mapping n1 actions (`click`, `type`, `scroll`, `drag`, `hover`, `key_press`, `wait`, `refresh`, `go_back`, `goto_url`, `stop`) to Kernel's Computer Controls API - **Coordinate scaling** from n1's 1000×1000 relative space to actual viewport dimensions - **Session management** with replay recording support ### Dual Screenshot Modes | Mode | Description | |------|-------------| | `computer_use` (default) | Uses Kernel's Computer Controls screenshot API (stable) | | `playwright` | Uses CDP WebSocket connection for viewport-only screenshots without browser chrome, optimized for n1's training data per [Yutori's documentation](https://docs.yutori.com/reference/n1#screenshot-requirements) | ### Implementation Details - **Model**: `n1-preview-2025-11` outputs coordinates in 1000×1000 space - **Viewport**: 1200×800 at 25Hz (closest to Yutori's recommended 1280×800) ### With Playwright Mode for viewport-only screenshots `kernel invoke ts-yutori-cua cua-task --payload '{"query": "...", "mode": "playwright"}'` ### Files Changed - `pkg/templates/typescript/yutori-computer-use/` - TypeScript template - `pkg/templates/python/yutori-computer-use/` - Python template - `pkg/create/templates.go` - Template registration Closes KERNEL-742 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces Yutori n1 computer-use templates with a full agent loop, tools, and session management for both TypeScript and Python. > > - New templates: `pkg/templates/typescript/yutori-computer-use` and `pkg/templates/python/yutori-computer-use` with sampling loops, action-to-Computer Controls mappings, Playwright CDP option, coordinate scaling, and replay recording support > - Registers `yutori-computer-use` in `pkg/create/templates.go` (template catalog, sort priority, deploy/invoke commands) > - QA guide updates: adds Yutori rows, create/deploy/invoke commands for both modes, expands automated test matrix/count; `.gitignore` now ignores `qa-*` > > Usage highlights: > - Invoke examples for both `computer_use` (default) and `playwright` modes > - Viewport defaults to `1200×800`; model set to `n1-preview-2025-11` > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit c8f9c27. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> -------
1 parent 755c2b9 commit 04b4447

22 files changed

Lines changed: 2837 additions & 4 deletions

File tree

.cursor/commands/qa.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,22 @@ Here are all valid language + template combinations:
5858
| typescript | openai-computer-use | ts-openai-cua | ts-openai-cua | Yes | OPENAI_API_KEY |
5959
| typescript | gemini-computer-use | ts-gemini-cua | ts-gemini-cua | Yes | GOOGLE_API_KEY |
6060
| typescript | claude-agent-sdk | ts-claude-agent-sdk | ts-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
61+
| typescript | yutori-computer-use | ts-yutori-cua | ts-yutori-cua | Yes | YUTORI_API_KEY |
62+
63+
> **Note:** The `yutori-computer-use` template supports two modes: `computer_use` (default, full VM screenshots) and `playwright` (viewport-only screenshots via CDP). Both modes should be tested.
64+
6165
| python | sample-app | py-sample-app | python-basic | No | - |
6266
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
6367
| python | browser-use | py-browser-use | python-bu | Yes | OPENAI_API_KEY |
6468
| python | anthropic-computer-use | py-anthropic-cua | python-anthropic-cua | Yes | ANTHROPIC_API_KEY |
6569
| python | openai-computer-use | py-openai-cua | python-openai-cua | Yes | OPENAI_API_KEY |
6670
| python | openagi-computer-use | py-openagi-cua | python-openagi-cua | Yes | OAGI_API_KEY |
6771
| python | claude-agent-sdk | py-claude-agent-sdk | py-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
72+
| python | yutori-computer-use | py-yutori-cua | python-yutori-cua | Yes | YUTORI_API_KEY |
73+
74+
> **Yutori Modes:**
75+
> - `computer_use` (default): Uses Kernel's Computer Controls API with full VM screenshots
76+
> - `playwright`: Uses Playwright via CDP WebSocket for viewport-only screenshots (optimized for n1 model)
6877
6978
### Create Commands
7079

@@ -80,6 +89,7 @@ Run each of these (they are non-interactive when all flags are provided):
8089
../bin/kernel create -n ts-openai-cua -l typescript -t openai-computer-use
8190
../bin/kernel create -n ts-gemini-cua -l typescript -t gemini-computer-use
8291
../bin/kernel create -n ts-claude-agent-sdk -l typescript -t claude-agent-sdk
92+
../bin/kernel create -n ts-yutori-cua -l typescript -t yutori-computer-use
8393

8494
# Python templates
8595
../bin/kernel create -n py-sample-app -l python -t sample-app
@@ -89,6 +99,7 @@ Run each of these (they are non-interactive when all flags are provided):
8999
../bin/kernel create -n py-openai-cua -l python -t openai-computer-use
90100
../bin/kernel create -n py-openagi-cua -l python -t openagi-computer-use
91101
../bin/kernel create -n py-claude-agent-sdk -l python -t claude-agent-sdk
102+
../bin/kernel create -n py-yutori-cua -l python -t yutori-computer-use
92103
```
93104

94105
## Step 5: Deploy Each Template
@@ -176,6 +187,15 @@ echo "ANTHROPIC_API_KEY=<value from human>" > .env
176187
cd ..
177188
```
178189

190+
**ts-yutori-cua** (needs YUTORI_API_KEY):
191+
192+
```bash
193+
cd ts-yutori-cua
194+
echo "YUTORI_API_KEY=<value from human>" > .env
195+
../bin/kernel deploy index.ts --env-file .env
196+
cd ..
197+
```
198+
179199
**py-browser-use** (needs OPENAI_API_KEY):
180200

181201
```bash
@@ -221,6 +241,15 @@ echo "ANTHROPIC_API_KEY=<value from human>" > .env
221241
cd ..
222242
```
223243

244+
**py-yutori-cua** (needs YUTORI_API_KEY):
245+
246+
```bash
247+
cd py-yutori-cua
248+
echo "YUTORI_API_KEY=<value from human>" > .env
249+
../bin/kernel deploy main.py --env-file .env
250+
cd ..
251+
```
252+
224253
## Step 6: Provide Invoke Commands
225254

226255
Once all deployments are complete, present the human with these invoke commands to test manually:
@@ -235,6 +264,8 @@ kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipe
235264
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
236265
kernel invoke ts-gemini-cua gemini-cua-task --payload '{"startingUrl": "https://www.magnitasks.com/", "instruction": "Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board? You are done successfully when the items are moved."}'
237266
kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
267+
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
268+
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
238269

239270
# Python apps
240271
kernel invoke python-basic get-page-title --payload '{"url": "https://www.google.com"}'
@@ -244,11 +275,13 @@ kernel invoke python-anthropic-cua cua-task --payload '{"query": "Go to http://m
244275
kernel invoke python-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
245276
kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navigate to https://agiopen.org and click the What is Computer Use? button"}'
246277
kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
278+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
279+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
247280
```
248281

249282
## Step 7: Automated Runtime Testing (Optional)
250283

251-
**STOP and ask the human:** "Would you like me to automatically invoke all 15 templates and report back on their runtime status?"
284+
**STOP and ask the human:** "Would you like me to automatically invoke all 19 test cases and report back on their runtime status?"
252285

253286
If the human agrees, invoke each template use the Kernel CLI and collect results. Present findings in this format:
254287

@@ -268,13 +301,17 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
268301
| ts-openai-cua | ts-openai-cua | | |
269302
| ts-gemini-cua | ts-gemini-cua | | |
270303
| ts-claude-agent-sdk | ts-claude-agent-sdk | | |
304+
| ts-yutori-cua | ts-yutori-cua | | mode: computer_use |
305+
| ts-yutori-cua | ts-yutori-cua | | mode: playwright |
271306
| py-sample-app | python-basic | | |
272307
| py-captcha-solver | python-captcha-solver | | |
273308
| py-browser-use | python-bu | | |
274309
| py-anthropic-cua | python-anthropic-cua | | |
275310
| py-openai-cua | python-openai-cua | | |
276311
| py-openagi-cua | python-openagi-cua | | |
277312
| py-claude-agent-sdk | py-claude-agent-sdk | | |
313+
| py-yutori-cua | python-yutori-cua | | mode: computer_use |
314+
| py-yutori-cua | python-yutori-cua | | mode: playwright |
278315

279316
Status values:
280317
- **SUCCESS**: App started and returned a result
@@ -287,9 +324,9 @@ Notes should include brief error messages for failures or confirmation of succes
287324
- [ ] Built CLI with `make build`
288325
- [ ] Created QA directory
289326
- [ ] Got KERNEL_API_KEY from human
290-
- [ ] Created all 15 template variations
291-
- [ ] Got required API keys from human (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OAGI_API_KEY)
292-
- [ ] Deployed all 15 apps
327+
- [ ] Created all 17 template variations
328+
- [ ] Got required API keys from human (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OAGI_API_KEY, YUTORI_API_KEY)
329+
- [ ] Deployed all 17 apps
293330
- [ ] Provided invoke commands to human for manual testing
294331
- [ ] (Optional) Ran automated runtime testing and reviewed results
295332

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,6 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
3838
# Finder (MacOS) folder config
3939
.DS_Store
4040
kernel
41+
42+
# QA testing directories
43+
qa-*

pkg/create/templates.go

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ const (
1818
TemplateStagehand = "stagehand"
1919
TemplateOpenAGIComputerUse = "openagi-computer-use"
2020
TemplateClaudeAgentSDK = "claude-agent-sdk"
21+
TemplateYutoriComputerUse = "yutori-computer-use"
2122
)
2223

2324
type TemplateInfo struct {
@@ -84,6 +85,11 @@ var Templates = map[string]TemplateInfo{
8485
Description: "Implements a Claude Agent SDK browser automation agent",
8586
Languages: []string{LanguageTypeScript, LanguagePython},
8687
},
88+
TemplateYutoriComputerUse: {
89+
Name: "Yutori n1 Computer Use",
90+
Description: "Implements a Yutori n1 computer use agent",
91+
Languages: []string{LanguageTypeScript, LanguagePython},
92+
},
8793
}
8894

8995
// GetSupportedTemplatesForLanguage returns a list of all supported template names for a given language
@@ -108,6 +114,8 @@ func GetSupportedTemplatesForLanguage(language string) TemplateKeyValues {
108114
return 1
109115
case TemplateGeminiComputerUse:
110116
return 2
117+
case TemplateYutoriComputerUse:
118+
return 3
111119
default:
112120
return 10
113121
}
@@ -200,6 +208,11 @@ var Commands = map[string]map[string]DeployConfig{
200208
NeedsEnvFile: true,
201209
InvokeCommand: `kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'`,
202210
},
211+
TemplateYutoriComputerUse: {
212+
EntryPoint: "index.ts",
213+
NeedsEnvFile: true,
214+
InvokeCommand: `kernel invoke ts-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'`,
215+
},
203216
},
204217
LanguagePython: {
205218
TemplateSampleApp: {
@@ -237,6 +250,11 @@ var Commands = map[string]map[string]DeployConfig{
237250
NeedsEnvFile: true,
238251
InvokeCommand: `kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'`,
239252
},
253+
TemplateYutoriComputerUse: {
254+
EntryPoint: "main.py",
255+
NeedsEnvFile: true,
256+
InvokeCommand: `kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'`,
257+
},
240258
},
241259
}
242260

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Kernel Python Sample App - Yutori n1 Computer Use
2+
3+
This is a Kernel application that implements a prompt loop using Yutori's n1 computer use model with Kernel's Computer Controls API.
4+
5+
[n1](https://yutori.com/blog/introducing-navigator) is Yutori's pixels-to-actions LLM that predicts browser actions from screenshots.
6+
7+
## Setup
8+
9+
1. Get your API keys:
10+
- **Kernel**: [dashboard.onkernel.com](https://dashboard.onkernel.com)
11+
- **Yutori**: [yutori.com](https://yutori.com)
12+
13+
2. Deploy the app:
14+
```bash
15+
kernel login
16+
cp .env.example .env # Add your YUTORI_API_KEY
17+
kernel deploy main.py --env-file .env
18+
```
19+
20+
## Usage
21+
22+
```bash
23+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
24+
```
25+
26+
## Recording Replays
27+
28+
> **Note:** Replay recording is only available to Kernel users on paid plans.
29+
30+
Add `"record_replay": true` to your payload to capture a video of the browser session:
31+
32+
```bash
33+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com", "record_replay": true}'
34+
```
35+
36+
When enabled, the response will include a `replay_url` field with a link to view the recorded session.
37+
38+
## Viewport Configuration
39+
40+
Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy. Kernel's closest supported viewport is **1200×800 at 25Hz**, which this template uses by default.
41+
42+
> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions. The slight width difference (1200 vs 1280) should have minimal impact on accuracy.
43+
44+
See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport) for all supported configurations.
45+
46+
## n1 Supported Actions
47+
48+
| Action | Description |
49+
|--------|-------------|
50+
| `click` | Left mouse click at coordinates |
51+
| `scroll` | Scroll page in a direction |
52+
| `type` | Type text into focused element |
53+
| `key_press` | Send keyboard input |
54+
| `hover` | Move mouse without clicking |
55+
| `drag` | Click-and-drag operation |
56+
| `wait` | Pause for UI to update |
57+
| `refresh` | Reload current page |
58+
| `go_back` | Navigate back in history |
59+
| `goto_url` | Navigate to a URL |
60+
| `stop` | End task with final answer |
61+
62+
## Resources
63+
64+
- [Yutori n1 API Documentation](https://docs.yutori.com/reference/n1)
65+
- [Kernel Documentation](https://www.kernel.sh/docs/quickstart)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
__pycache__/
2+
*.py[cod]
3+
*$py.class
4+
.env
5+
*.log
6+
.venv/
7+
venv/

0 commit comments

Comments
 (0)