Skip to content

Latest commit

 

History

History
390 lines (273 loc) · 17.7 KB

File metadata and controls

390 lines (273 loc) · 17.7 KB

CloudPhone Plugin for OpenClaw

Chinese README

OpenClaw CloudPhone is a plugin that gives AI agents cloud phone automation capabilities through natural language.

With a single instruction, an agent can submit any cloud phone task to the backend AI Agent, which handles the full execution loop — screen observation, LLM planning, and UI actions — and streams the result back in real time.

Quick Start

1. Install the plugin

openclaw plugins install @suqiai/cloudphone

To update the plugin later, run:

openclaw plugins update @suqiai/cloudphone

2. Configure the plugin

Set apikey in plugins.entries.cloudphone.config. The plugin uses built-in defaults for other optional settings. If you need a default LLM provider for the cloud phone automation agent (backend), add optional llmApiKey and llmBaseUrl as well.

Option A: Configuration file (openclaw.json)

Add the following configuration to openclaw.json:

  • apikey: Obtain your API Key by logging in or signing up at https://ai.suqi.tech, then add it in your account/settings.
{
  "plugins": {
    "entries": {
      "cloudphone": {
        "enabled": true,
        "config": {
          "apikey": "the apikey you can get from the user center of this website"
        }
      }
    }
  }
}

Optional — default LLM credentials for automation (omit if the backend supplies its own):

{
  "plugins": {
    "entries": {
      "cloudphone": {
        "enabled": true,
        "config": {
          "apikey": "your CloudPhone apikey",
          "llmApiKey": "sk-xxx",
          "llmBaseUrl": "https://open.bigmodel.cn/api/paas/v4"
        }
      }
    }
  }
}

Option B: OpenClaw Console UI

  1. Open the OpenClaw console in your browser.
  2. Go to the Plugins section, find CloudPhone and enable it.
  3. Set apikey (from https://ai.suqi.tech after login or sign-up, in your account/settings).
  4. Optionally set LLM API Key and LLM Base URL if you want plugin-level default LLM settings for automation. For Z.AI usage, you can follow Z.AI API Introduction to create an API key.

Screenshots:

OpenClaw Console — Plugins

OpenClaw Console — CloudPhone config

3. Restart the Gateway

openclaw gateway restart

How It Works

This plugin exposes the CloudPhone backend AI Agent as three high-level tools:

  1. cloudphone_execute — Submit a natural language instruction to the backend. The backend handles LLM interpretation, cloud phone UI automation (observe → plan → act loop), and dispatches all actions automatically. Returns a task_id immediately.

  2. cloudphone_execute_and_wait — Auto-chain call: execute cloudphone_execute, then automatically run one cloudphone_task_result poll and return the first 10-second window result.

  3. cloudphone_task_result — Subscribe to SSE for a task; each call consumes one 10-second window and returns the thinking delta for that window until terminal status.

The agent no longer needs to directly control UI coordinates, manage screenshots, or call individual tap/swipe/input tools. The backend AI Agent handles the full automation loop.

Configuration

Field Type Required Default Description
apikey string Yes - Authorization credential (ApiKey)
llmApiKey string No - Default LLM provider API key for cloud phone automation (sensitive; omit if not needed). For Z.AI, create it from Z.AI API Introduction.
llmBaseUrl string No - Default LLM provider base URL for cloud phone automation. Example for Z.AI: https://open.bigmodel.cn/api/paas/v4.

Obtain your API Key by logging in or signing up at https://ai.suqi.tech, then find it in your account/settings.

Optional fields baseUrl, timeout, llmApiKey, and llmBaseUrl are fully described in openclaw.plugin.json. baseUrl and timeout use built-in defaults when omitted; LLM fields are omitted by default unless you configure them.

When using Z.AI as the LLM provider, set:

  • llmApiKey: your Z.AI API key
  • llmBaseUrl: https://open.bigmodel.cn/api/paas/v4

Tool Overview

After the plugin is installed, the agent automatically gets the following tools.

User and device management

Tool Description
cloudphone_get_user_profile Get the current user's basic information
cloudphone_list_devices List cloud phone devices with pagination, keyword search, and status filters
cloudphone_get_device_info Get detailed information for a specific device
cloudphone_get_device_screenshot_url Get the latest screenshot URL by device_id (default-enabled; user-trigger only)
cloudphone_create_share_link Generate a device streaming share link by device_id (default-enabled; user-trigger only)

AI Agent task execution

Tool Description
cloudphone_execute Submit a natural language instruction; returns task_id immediately
cloudphone_execute_and_wait Auto-chain execute + first task_result poll
cloudphone_task_result Return 10s-window thinking delta and current task status

Usage Examples

After installation and configuration, you can control cloud phones through natural language prompts.

Run a UI automation task

Open WeChat on the cloud phone, search for the "OpenClaw" public account, and follow it

The agent will:

  1. Call cloudphone_list_devices to get the device ID
  2. Call cloudphone_execute_and_wait to submit and trigger the first poll automatically
  3. If status is running, continue calling cloudphone_task_result every ~10 seconds until success/done/error

Check device status

Show me my cloud phone devices

The agent will call cloudphone_list_devices and return the device list.

Submit a task and wait for completion

Agent: cloudphone_execute_and_wait
  instruction: "打开抖音,搜索美食视频并点赞第一条"
  device_id: "abc123"
→ returns: { ok: false, task_result: { status: "running", thinking: [...] } }

Agent: cloudphone_task_result
  task_id: 42
→ returns 10s-window delta until terminal: { ok: true, status: "done", result: {...} }

Tool Parameters

cloudphone_execute

instruction    : string  - Natural language task instruction (required)
device_id      : string  - Device unique ID (recommended)
user_device_id : number  - User device ID (compatibility, device_id takes priority)
session_id     : string  - Optional session ID for streaming persistence
lang           : string  - Language hint: "cn" (default) or "en"
api_key        : string  - Optional LLM provider API key; overrides plugin-level llmApiKey when set
base_url       : string  - Optional LLM provider base URL; overrides plugin-level llmBaseUrl when set
max_steps      : integer - Optional maximum number of Agent steps for this task (range 1-200).
                           When omitted, falls back to plugin-level `maxSteps`, then the built-in default 50.

The same parameters apply to cloudphone_execute_and_wait (it uses the same schema).

cloudphone_task_result

task_id    : number - Task ID from cloudphone_execute (required)

Response fields:

ok         : boolean - Whether the operation succeeded
task_id    : number  - Echo of the input task_id
status     : string  - "done" | "success" | "error" | "timeout"
thinking   : string[] - New thinking lines from the current 10-second polling window (delta)
result     : object  - Final task result from the backend
message    : string  - Error message (when status is "error" or "timeout")

cloudphone_list_devices

keyword : string  - Search keyword (device name or device ID)
status  : string  - Status filter: "online" | "offline"
page    : integer - Page number, default 1
size    : integer - Items per page, default 20

cloudphone_get_device_info

device_id : string - Device unique ID (32-char hex opaque identifier, required)

cloudphone_get_device_screenshot_url

device_id : string - Device unique ID (required)

Notes:

  • This tool is available by default after plugin installation (no extra whitelist enablement required).
  • Call this tool only when the user explicitly requests a screenshot URL.
  • The returned screenshot_url is passed through as-is from upstream and should be treated as a sensitive temporary credential URL.

cloudphone_create_share_link

device_id : string - Device unique ID (32-char hex opaque identifier, required).
                     Typically obtained from the `device_id` field of cloudphone_list_devices.

Response fields:

ok         : boolean  - Whether the operation succeeded
device_id  : string   - Echo of the input device_id
share_url  : string   - Device streaming share link (signed, sensitive; present on success)
code       : string   - Error code on failure (INVALID_PARAMS / HTTP_ERROR / INVALID_UPSTREAM_PAYLOAD, etc.)
message    : string   - Error message on failure

Notes:

  • This tool is available by default after plugin installation (no extra whitelist enablement required).
  • Call this tool only when the user explicitly requests a share link.
  • The returned share_url is passed through as-is from upstream (including signed query parameters) and must be treated as a sensitive credential URL: it may expire, must not be re-distributed without the user's consent, and must never be logged in full.
  • device_id is a 32-character hex opaque identifier (not a decimal number), so there is no long-integer precision concern at the LLM / tool-call layer.
  • The request body field name mirrors the input parameter exactly (device_id, snake_case); the plugin performs no field-name or numeric conversion.
  • Backend contract: requires /openapi/v1/devices/create/share/link and /openapi/v1/devices/info to accept device_id. Gateways with un-upgraded backends will return a business error.

Usage example: generate a device share link

Please generate a share link for device xxxxxx (xxxxxx being the device's 32-char hex device_id)

The agent will:

  1. Recognize the user's explicit share-link request.
  2. Call cloudphone_create_share_link with device_id: "a1b2c3d4e5f67890a1b2c3d4e5f67890".
  3. Echo the returned share_url back in the chat.

If the user does not provide a specific device ID, the agent can first call cloudphone_list_devices or cloudphone_get_device_info (keyed by device_id) to help the user pick the target device before generating the share link.

FAQ

Q: The agent cannot find the CloudPhone tools after installation.

Make sure plugins.entries.cloudphone.enabled is set to true in openclaw.json, then restart the Gateway.

Q: Why does cloudphone_task_result return running?

This is expected when the current 10-second polling window has not reached terminal status. Keep calling cloudphone_task_result every ~10 seconds until success/done/error.

Q: A tool call fails with a request error or authorization failure.

  • Check whether apikey is valid and that you restarted the Gateway after changing config
  • Check network connectivity and whether the CloudPhone service is reachable
  • 401 errors indicate an invalid or expired apikey

Q: How do I get an apikey?

Log in or sign up at https://ai.suqi.tech and get your API Key from your account/settings.

Q: Does cloudphone_execute support concurrent tasks?

No, not for the same agent context. The plugin enforces serial execution per agent key (session_id, then device_id, then user_device_id, otherwise default).
If you call cloudphone_execute before the previous task reaches terminal status in cloudphone_task_result, it returns code: "AGENT_BUSY" with blocking_task_id.

Required call order:

  1. cloudphone_execute_and_wait (auto-runs the first poll)
  2. cloudphone_task_result (if status is running, continue polling until terminal: success/done/error)
  3. Next cloudphone_execute

Changelog

Current version: v2026.4.24

v2026.4.24

  • Added new agent tool cloudphone_create_share_link to generate a signed streaming share link for a specific device by device_id (default-enabled; explicit user-trigger only)
  • Migrated cloudphone_get_device_info parameter from user_device_id (number) to device_id (32-char hex string) to align with the opaque device identifier used across the plugin
  • Adopted json-bigint as the shared JSON parser for all upstream API responses and SSE events (agent_thinking / task_result / error), preventing precision loss for 19-digit snowflake IDs and other long integers
  • Hardened normalizeTaskId to safely accept both string and number inputs, rejecting oversized numeric strings instead of silently truncating
  • Added a shared, defensive JSON-parse error path for upstream responses: malformed payloads now return a structured error instead of throwing
  • Added json-bigint (and its @types/json-bigint) as a dependency of the plugin
  • Synced package/plugin/doc version references to v2026.4.24

v2026.4.20

  • Added optional plugin config maxSteps (1-200, default 50) to cap the number of steps the cloud phone Agent may execute per task
  • Extended cloudphone_execute / cloudphone_execute_and_wait with an optional max_steps parameter (range 1-200) that overrides the plugin-level maxSteps on a per-call basis
  • Resolved the effective max_steps using the priority: call parameter > plugin config > built-in default (50), and always forwarded it to the backend request body
  • Included the effective max_steps value in cloudphone_execute start logs for easier diagnosis
  • Synced package/plugin/doc version references to v2026.4.20

v2026.4.14001

  • Clarified plugin setup docs to include optional default LLM provider fields: llmApiKey and llmBaseUrl
  • Added OpenClaw UI configuration guidance and Z.AI reference link for optional LLM provider credentials
  • Documented cloudphone_execute_and_wait parameter parity with cloudphone_execute, including api_key and base_url overrides
  • Synced package/plugin/doc version references to v2026.4.14001

v2026.4.14

  • Added optional plugin config llmApiKey and llmBaseUrl for default LLM provider credentials used by cloud phone automation
  • Extended cloudphone_execute with optional api_key and base_url parameters (override plugin-level defaults) forwarded to the backend request body
  • Republish as v2026.4.14 (supersedes erroneous v2026.4.4 tag)
  • Synced package/plugin/doc version references to v2026.4.14

v2026.4.3

  • Added cloudphone_get_device_screenshot_url (default-enabled; call only on explicit user request) backed by POST /openapi/v1/devices/snapshot
  • Strip query strings from screenshot_url in logs and MCP summaries while preserving the full URL in tool JSON output
  • Documented parameters, privacy notes, and guardrails for the screenshot URL tool
  • Added Node test coverage for screenshot URL success and upstream error paths; refined tsconfig include/exclude for *.test.ts
  • Synced package/plugin/doc version references to v2026.4.3

v2026.4.2

  • Enforced per-device/session serial task execution: cloudphone_execute returns AGENT_BUSY when a task is still in flight until cloudphone_task_result reaches a terminal status
  • Improved SSE parsing for both standard event:/data: framing and backend JSON-embedded event shapes
  • Strengthened tool descriptions with explicit guardrails (no autonomous extra steps, no screenshot-only requests, bounded retries)
  • Renamed npm package scope to @suqiai/cloudphone (commit 3c50f95)
  • Added src/tools.serial-gating.test.ts (Node test runner); exclude *.test.ts from tsc output so dist/ stays publish-clean
  • Updated built-in skill docs and README guidance for the execute → poll workflow
  • Synced package/plugin/doc version references to v2026.4.2

v2026.4.1

  • Added cloudphone_execute_and_wait to auto-chain task submission and the first result polling
  • Clarified tool behavior and call sequence documentation for task execution and polling
  • Updated .gitignore with docs/ and openspec/ entries for cleaner project management
  • Synced package/plugin/doc version references to v2026.4.1

v2026.3.31

  • Enhanced task execution and result handling flow in plugin tools
  • Improved task-related documentation and reference examples in built-in skills
  • Synced package/plugin/doc version references to v2026.3.31

v2026.3.30

  • Replaced 12 fine-grained UI automation tools (tap, swipe, snapshot, etc.) with 2 high-level backend-delegated tools
  • Added cloudphone_execute: submit natural language instructions to the backend AI Agent
  • Added cloudphone_task_result: stream agent thinking and final result via SSE
  • Removed AutoGLM direct integration (backend now handles the full observe → plan → act loop)
  • Simplified plugin config: removed all autoglm* fields, only apikey, baseUrl, timeout remain
  • Updated skills, README, and reference docs to reflect new architecture

v2026.3.27

  • Summarized and aligned release notes based on target commit 1da1031
  • Synced package/plugin/doc version references to v2026.3.27

v1.1.0

  • Enhanced screenshot handling in cloudphone_render_image for improved compatibility
  • Added the cloudphone-snapshot-url skill

v1.0.6

  • Added the built-in basic-skill skill distributed with the plugin
  • Added reference.md as a tool parameter quick reference

License

This plugin follows the license terms of the repository it belongs to.