You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
217
-
kernel invoke ts-gemini-cua gemini-cua-task
214
+
kernel invoke ts-gemini-cua gemini-cua-task --payload '{"startingUrl": "https://www.magnitasks.com/", "instruction": "Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board? You are done successfully when the items are moved."}'
Copy file name to clipboardExpand all lines: pkg/templates/typescript/gemini-computer-use/README.md
+28-4Lines changed: 28 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,13 @@ A Kernel application that demonstrates Computer Use Agent (CUA) capabilities usi
4
4
5
5
## What It Does
6
6
7
-
This app uses [Gemini 2.5's computer use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) capabilities to autonomously navigate websites and complete tasks. The example task searches for Kernel's company page on YCombinator and writes a blog post about their product.
7
+
This app uses [Gemini 2.5's computer use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) capabilities to autonomously navigate websites and complete tasks. The agent can interact with web pages just like a human would - clicking, typing, scrolling, and extracting information.
8
8
9
9
## Setup
10
10
11
11
1.**Add your API keys as environment variables:**
12
12
-`KERNEL_API_KEY` - Get from [Kernel dashboard](https://dashboard.onkernel.com/sign-in)
13
13
-`GOOGLE_API_KEY` - Get from [Google AI Studio](https://aistudio.google.com/apikey)
14
-
-`OPENAI_API_KEY` - Get from [OpenAI platform](https://platform.openai.com/api-keys)
15
14
16
15
## Running Locally
17
16
@@ -25,9 +24,10 @@ This runs the agent without a Kernel invocation context and provides the browser
25
24
26
25
## Deploying to Kernel
27
26
28
-
1.**Deploy the application:**
27
+
1.**Copy the example env file, add your API keys, and deploy:**
@@ -37,6 +37,30 @@ This runs the agent without a Kernel invocation context and provides the browser
37
37
38
38
The action creates a Kernel-managed browser and associates it with the invocation for tracking and monitoring.
39
39
40
+
## Alternative Model Providers
41
+
42
+
Stagehand's CUA agent supports multiple model providers. You can switch from Gemini to OpenAI or Anthropic by changing the model configuration in `index.ts` and redeploying your Kernel app:
43
+
44
+
**OpenAI Computer Use:**
45
+
```typescript
46
+
model: {
47
+
modelName: "openai/computer-use-preview",
48
+
apiKey: process.env.OPENAI_API_KEY
49
+
}
50
+
```
51
+
52
+
**Anthropic Claude Sonnet:**
53
+
```typescript
54
+
model: {
55
+
modelName: "anthropic/claude-sonnet-4-20250514",
56
+
apiKey: process.env.ANTHROPIC_API_KEY
57
+
}
58
+
```
59
+
60
+
When using alternative providers, make sure to:
61
+
1. Add the corresponding API key to your environment variables
62
+
2. Update the deploy command to include the new API key (e.g., `--env OPENAI_API_KEY=XXX`)
instruction: string="Click the Tasks option in the left-side bar, and move the 5 items in the 'To Do' and 'In Progress' items to the 'Done' section of the Kanban board? You are done successfully when the items are moved."
35
+
): Promise<SearchQueryOutput>{
33
36
// Executes a Computer Use Agent (CUA) task using Gemini 2.5 and Stagehand
34
37
35
38
constbrowserOptions={
@@ -49,11 +52,7 @@ async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutpu
49
52
conststagehand=newStagehand({
50
53
env: "LOCAL",
51
54
verbose: 1,
52
-
domSettleTimeoutMs: 30_000,
53
-
modelName: "gpt-4o",
54
-
modelClientOptions: {
55
-
apiKey: OPENAI_API_KEY
56
-
},
55
+
domSettleTimeout: 30_000,
57
56
localBrowserLaunchOptions: {
58
57
cdpUrl: kernelBrowser.cdp_ws_url
59
58
}
@@ -64,24 +63,21 @@ async function runStagehandTask(invocationId?: string): Promise<SearchQueryOutpu
64
63
// Your Stagehand implementation here
65
64
/////////////////////////////////////
66
65
try{
67
-
constpage=stagehand.page;
66
+
constpage=stagehand.context.pages()[0];
68
67
69
68
constagent=stagehand.agent({
70
-
provider: "google",
71
-
model: "gemini-2.5-computer-use-preview-10-2025",
72
-
instructions: `You are a helpful assistant that can use a web browser.
0 commit comments