Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 61 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ plugin that calls this binary, while this adapter owns platform-specific backend
Dedicated OpenCoven computer-use boundary for OpenClaw:

- macOS: shells to [`peekaboo`](https://peekaboo.boo) with `--json --no-remote`
- Linux/Windows: returns a clean unsupported JSON response for now
- Linux (X11): shells to `scrot`/`maim` (capture), `xdotool` (input), `wmctrl` (focus)
- Linux (Wayland): shells to `grim` (capture), `wtype`/`ydotool` (input), `swaymsg` (focus on Sway)
- Windows: returns a clean unsupported JSON response for now
- Session detection uses `XDG_SESSION_TYPE`, with fallback to `WAYLAND_DISPLAY`/`DISPLAY`
- No shell interpolation; uses process argv directly
- Interactive desktop actions require OpenClaw approval and adapter `--confirm`
- Typed text, clipboard text, file-write content, tokens, cookies, and secrets
Expand All @@ -35,6 +38,63 @@ All commands print a JSON envelope. The 0.1.0 command names remain as aliases:
`permissions -> doctor`, `see -> inspect`, `capture -> screenshot`,
`type -> type-text`, and `press -> keypress`.

## Linux / Ubuntu onboarding

Linux desktop-use shells to per-session helper tools instead of a single
bundled backend. Run the doctor first to see what's installed:

```bash
coven-desktop-use doctor
```

The JSON response includes the detected session (`x11` or `wayland`), a tool
inventory (each tool's path or `found:false`), and a `setupGuide` with the
exact `apt install` line for missing pieces.

Recommended packages by session:

```bash
# X11 (default on Ubuntu 22.04 GNOME with "Login on Xorg" selected,
# or any KDE/X session):
sudo apt install scrot xdotool wmctrl

# Wayland (default on Ubuntu 22.04+ GNOME, Sway, Hyprland, KDE Wayland):
sudo apt install grim wtype ydotool
```

### Wayland notes

- `ydotool` synthesises mouse events through `/dev/uinput`. After installing
it, enable the daemon and ensure your user can talk to it:

```bash
sudo systemctl enable --now ydotoold
sudo usermod -aG input "$USER"
# log out and back in
```

- `wtype` only works on wlroots-based compositors (Sway, Hyprland, river).
GNOME Mutter and KDE KWin do not accept `wtype` events; on those
compositors the adapter falls back to `ydotool` for typing.
- Window focus on Wayland is compositor-specific. `focus` works on Sway when
`SWAYSOCK` is set and `swaymsg` is on `PATH`. GNOME Mutter has no public
CLI for window activation.
- `scroll` on Wayland degrades to `Page_Up`/`Page_Down` keystrokes via
`wtype` because there is no portable scroll-wheel injector across
Wayland compositors. Install `wlrctl` if you need real wheel events on
wlroots compositors. The response includes `degraded: ...` when this
fallback is taken.

### What is *not* supported on Linux yet

- AT-SPI element annotation. `inspect` captures a screenshot but does not
return `B1`/`T2`-style element ids, so `click --on B1` is unavailable.
Use `click --coords x,y --confirm` instead, after a screenshot.
- "Active window" capture on vanilla Wayland (`grim` has no notion of
focused window). On X11, `scrot --focused` and `maim -i $(xdotool
getactivewindow)` both work and are picked automatically when
`--mode window` is requested.

## macOS onboarding

Desktop inspection and interaction require two macOS privacy grants because the
Expand Down
Loading