Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: AutoControl Docker CI

on:
push:
branches: [ "dev", "main" ]
paths:
- "docker/**"
- "je_auto_control/**"
- "pyproject.toml"
- ".github/workflows/docker.yml"
pull_request:
branches: [ "dev", "main" ]
paths:
- "docker/**"
- "je_auto_control/**"
- "pyproject.toml"
- ".github/workflows/docker.yml"

permissions:
contents: read

jobs:
build-image:
name: Build AutoControl container
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3 # NOSONAR githubactions:S7637 — project convention pins to vendored major version tags (matches dev.yml / stable.yml / quality.yml)

Check warning on line 31 in .github/workflows/docker.yml

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

.github/workflows/docker.yml#L31

An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release.

- name: Build image (no push)
uses: docker/build-push-action@v5 # NOSONAR githubactions:S7637 — project convention pins to vendored major version tags

Check warning on line 34 in .github/workflows/docker.yml

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

.github/workflows/docker.yml#L34

An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release.
with:
context: .
file: docker/Dockerfile
tags: autocontrol:ci
load: true
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Image size
run: docker image inspect autocontrol:ci --format='size={{.Size}} bytes'

headless-tests:
name: Headless pytest inside the image
needs: build-image
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3 # NOSONAR githubactions:S7637 — project convention pins to vendored major version tags (matches dev.yml / stable.yml / quality.yml)

- name: Rebuild image (cached)
uses: docker/build-push-action@v5 # NOSONAR githubactions:S7637 — project convention pins to vendored major version tags
with:
context: .
file: docker/Dockerfile
tags: autocontrol:ci
load: true
cache-from: type=gha

# Mount the repo so pytest can read tests + write the artifact.
- name: Run headless tests under Xvfb
run: |
docker run --rm \
--user root \
-v "$PWD:/work" -w /work \
--entrypoint /bin/sh \
autocontrol:ci -c "
pip install --no-cache-dir -r dev_requirements.txt &&
xvfb-run -a -s '-screen 0 1280x800x24' \
python -m pytest test/unit_test/headless -q --tb=short
"

- name: Smoke test the entrypoint (rest mode)
run: |
docker run --rm -d --name ac-rest -p 9939:9939 \
-e AC_TOKEN=ci-token autocontrol:ci rest
for attempt in 1 2 3 4 5 6 7 8 9 10; do
if curl -fsS -H "Authorization: Bearer ci-token" \
http://127.0.0.1:9939/health; then
echo "REST API is up"
break
fi
sleep 2
done
docker logs ac-rest || true
docker stop ac-rest
54 changes: 50 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-05)](#whats-new-2026-05)
- [Features](#features)
- [Architecture](#architecture)
- [Installation](#installation)
Expand Down Expand Up @@ -54,6 +55,49 @@

---

## What's new (2026-05)

Twenty-three additions covering smarter locators, deeper IDE / ops
tooling, two new platforms, and fresh integrations. Each ships with a
headless API, an `AC_*` executor command, an `ac_*` MCP tool, and
(where it makes sense) a Qt GUI tab. Full reference page:
[`docs/source/Eng/doc/new_features/v2_features_doc.rst`](docs/source/Eng/doc/new_features/v2_features_doc.rst).

**Locator + selector intelligence**
- **Self-healing locator** — `image_template → VLM` fallback with a JSON-lines audit log (`AC_self_heal_locate / _click`).
- **Anchor-based locator** — find element B by spatial relation (`above`, `below`, `left_of`, `right_of`, `near`) to anchor A; anchor and target can use different backends (image / OCR / VLM / a11y).
- **OCR with structured output** — cluster raw OCR matches into rows, tables, and `label:value` form fields (`AC_ocr_read_structure`).
- **Smart waits** — `wait_until_screen_stable`, `wait_until_pixel_changes`, `wait_until_region_idle`: frame-diff replacements for `time.sleep`.
- **A/B locator framework** — race N strategies for the same target; recommend the historically best one from a persisted ledger.

**Operations + observability**
- **LLM cost telemetry** — per-call token + USD log with day / model / provider rollup (`record_llm_call`, `summarise_llm_costs`).
- **Trace replay UI** — scrubbable timeline over the existing time-travel recordings with per-step action list.
- **Failure → ticket automation** — fan a failure report out to Jira / Linear / GitHub Issues when a scheduled / triggered / REST run fails.
- **Container CI templates** — GitHub Actions + GitLab CI workflows that build the image, run the headless pytest suite under Xvfb, and smoke-test the REST entrypoint; XFCE+x11vnc Dockerfile variant for flows that need a real WM.
- **Cross-host DAG orchestrator** — parallel execution with skip-on-failure cascade across local + admin-console-registered hosts (`run_dag`, `AC_run_dag`).
- **Multi-viewer presence** — roster + controller/observer roles for the remote desktop, with a thread-safe Python `PresenceRegistry` independent of aiortc.

**Agent + integrations**
- **Computer-use high-level API** — `run_computer_use(goal, ...)` wraps `ComputerUseAgentBackend` + `AgentLoop`; auto-detects display size; bounded by `max_steps` / `wall_seconds`.
- **WebRunner convenience commands** — `web_open` / `web_quit` / `web_screenshot` / `web_current_url` on top of the existing `je_web_runner` bridge; same surface exposed as `AC_web_*` and `ac_web_*`.
- **Chat-ops bot** — transport-agnostic `CommandRouter` + polling Slack adapter. Built-in commands: `/help`, `/scripts`, `/run`, `/screenshot`, `/status`. RBAC via `required_role`.

**Platform coverage**
- **Wayland CLI backend** — `wtype` / `ydotool` / `grim` with `XDG_SESSION_TYPE` auto-detect and X11 (XWayland) fallback; override via `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto`.
- **Wayland libei native** — ctypes binding to `libei.so.*` for microsecond-latency input; opt-in via `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto`. Defaults to libei when loadable.
- **macOS Accessibility deep-dive** — recursive `dump_accessibility_tree()` plus a polling `AccessibilityRecorder` for focus / bounds events.

**Developer experience**
- **autocontrol-lsp completion** — the language server now tracks `didOpen` / `didChange` / `didClose`, publishes diagnostics for invalid JSON and unknown `AC_*` commands, and provides signature help generated from the live executor table.
- **`.pyi` stub generator** — `python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` emits an IDE-facing stub so every `AC_*` command autocompletes with parameter hints.
- **VS Code extension** — bundled extension now ships `AutoControl: Run / Screenshot / Preview` commands that hit the local REST API.
- **Browser extension recorder** — Manifest V3 extension under `browser-extension/`: capture clicks, typing, navigation, form submissions in a tab and export them as `AC_web_*` / `WR_*` JSON.
- **pytest plugin + Gherkin BDD** — `pytest11` entry point auto-loads; `@pytest.mark.autocontrol` arms screenshot-on-failure; `bdd_steps.register_pytest_bdd_steps(pytest_bdd)` wires `Given/When/Then` onto every `AC_*` verb.
- **Visual flow editor** — node-based view that round-trips to the same JSON action format the list-based Script Builder uses.

---

## Features

- **Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control
Expand All @@ -71,7 +115,7 @@
- **Action Recording & Playback** — record mouse/keyboard events and replay them
- **JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug)
- **Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts
- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts on all three desktops: Windows (`RegisterHotKey`), macOS (`CGEventTap`, needs Accessibility permission), and Linux X11 (`XGrabKey` with NumLock / CapsLock variant masking). Wayland hotkeys are still compositor-dependent (each session bus exposes a different shortcut portal); a Wayland session can still drive AutoControl via the new Wayland input backend (see [What's new (2026-05)](#whats-new-2026-05)). Same `bind()` / `start()` API across platforms; the Strategy-pattern dispatch in `backends/` auto-picks the right backend at start time
- **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
- **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
- **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
Expand Down Expand Up @@ -1040,9 +1084,11 @@ Both flavours coexist; `job.is_cron` tells them apart.

### Global Hotkey Daemon

Bind OS-level hotkeys to action JSON scripts (Windows backend today;
macOS / Linux raise `NotImplementedError` on `start()` with Strategy-
pattern seams in place).
Bind OS-level hotkeys to action JSON scripts. Cross-platform — Windows
uses `RegisterHotKey`, macOS uses `CGEventTap` (requires Accessibility
permission), Linux X11 uses `XGrabKey` (Wayland not supported). The
same call sites work everywhere; the daemon picks the backend at
`start()` time.

```python
from je_auto_control import default_hotkey_daemon
Expand Down
53 changes: 49 additions & 4 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-05)](#本次更新-2026-05)
- [功能特性](#功能特性)
- [架构](#架构)
- [安装](#安装)
Expand Down Expand Up @@ -53,6 +54,49 @@

---

## 本次更新 (2026-05)

新增 23 个功能,覆盖更聪明的定位器、更深的 IDE / 运维工具、两个新平台后端,
以及几个新集成。每个功能都遵循框架既有模式:headless Python API、
`AC_*` executor 命令、`ac_*` MCP 工具,以及(适用时)Qt GUI 选项卡。
完整参考页面:
[`docs/source/Zh/doc/new_features/v2_features_doc.rst`](../docs/source/Zh/doc/new_features/v2_features_doc.rst)。

**定位器与选择器智能化**
- **自愈定位器** — `image_template → VLM` 后备并写入 JSON-lines 审计记录(`AC_self_heal_locate / _click`)。
- **锚点定位器** — 按空间关系(`above` / `below` / `left_of` / `right_of` / `near`)找到目标;锚点与目标可使用不同 backend(image / OCR / VLM / a11y)。
- **结构化 OCR** — 将原始 OCR match 聚合为 rows、tables、`label:value` 表单字段(`AC_ocr_read_structure`)。
- **智能等待** — `wait_until_screen_stable`、`wait_until_pixel_changes`、`wait_until_region_idle`:用 frame-diff 取代 `time.sleep`。
- **A/B 定位器框架** — 并行跑 N 个策略,依持久化的历史成绩推荐最佳。

**运维与可观测性**
- **LLM 成本遥测** — 每次调用的 token / USD 记录,按天 / 模型 / 提供方汇总(`record_llm_call`、`summarise_llm_costs`)。
- **追踪回放 UI** — 在现有 time-travel 录像上拖动时间轴并逐步显示动作。
- **失败 → 工单自动化** — 调度器/触发器/REST 任务失败时自动分发 Jira / Linear / GitHub Issues。
- **容器化 CI 模板** — GitHub Actions + GitLab CI workflow:构建镜像、跑 headless pytest(Xvfb 容器内)、smoke-test REST entrypoint;另含 XFCE+x11vnc Dockerfile 变体。
- **跨主机 DAG 编排** — 跨 local + admin-console 已注册主机并行执行,失败时下游 cascade 为 `skipped`(`run_dag`、`AC_run_dag`)。
- **多 viewer 名单** — 为远程桌面提供控制者 / 观察者角色,纯 Python `PresenceRegistry` 独立于 aiortc。

**代理与集成**
- **Computer-use 高阶 API** — `run_computer_use(goal, ...)` 封装 `ComputerUseAgentBackend` + `AgentLoop`;自动检测屏幕大小;以 `max_steps` / `wall_seconds` 为预算。
- **WebRunner 便利命令** — 在既有 `je_web_runner` 桥接之上的 `web_open` / `web_quit` / `web_screenshot` / `web_current_url`;同步以 `AC_web_*`、`ac_web_*` 暴露。
- **Chat-ops 机器人** — 传输层中立的 `CommandRouter` + Slack polling adapter。内置命令:`/help`、`/scripts`、`/run`、`/screenshot`、`/status`。RBAC 通过 `required_role`。

**平台覆盖**
- **Wayland CLI 后端** — `wtype` / `ydotool` / `grim`,按 `XDG_SESSION_TYPE` 自动检测,CLI 工具未装时回退到 X11 (XWayland);可用 `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto` 覆盖。
- **Wayland libei 原生后端** — 对 `libei.so.*` 的 ctypes 绑定,绕过 CLI shim 取得微秒级延迟;以 `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto` 启用,默认在 libei 可加载时用 libei。
- **macOS Accessibility 强化** — 递归 `dump_accessibility_tree()` 与 polling `AccessibilityRecorder`,捕捉 focus / bounds 事件。

**开发者体验**
- **autocontrol-lsp 完整化** — 追踪 `didOpen` / `didChange` / `didClose`、发布 JSON 与未知 `AC_*` 命令的 diagnostics、由即时的 executor 表生成 signature help。
- **`.pyi` stub 生成器** — `python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` 写出 IDE 端 stub 文件,所有 `AC_*` 命令在 IDE 内可 autocomplete 并显示参数提示。
- **VS Code 扩展** — 内置扩展新增 `AutoControl: Run / Screenshot / Preview` 命令,直接打本机 REST API。
- **浏览器扩展录制器** — `browser-extension/` 下的 Manifest V3 扩展:捕捉标签页的点击、输入、导航与表单提交,导出为 `AC_web_*` / `WR_*` JSON。
- **pytest plugin + Gherkin BDD** — `pytest11` entry point 自动加载;`@pytest.mark.autocontrol` 开启失败自动截屏;`bdd_steps.register_pytest_bdd_steps(pytest_bdd)` 一次把 `Given/When/Then` 对应到每一个 `AC_*` verb。
- **可视化流程编辑器** — node-based 视图与既有 list-based Script Builder 使用同一份 JSON 格式,互相兼容。

---

## 功能特性

- **鼠标自动化** — 移动、点击、按下、释放、拖拽、滚动,支持精确坐标控制
Expand All @@ -70,7 +114,7 @@
- **动作录制与回放** — 录制鼠标/键盘事件并重新播放
- **JSON 脚本执行** — 使用 JSON 动作文件定义并执行自动化流程(支持 dry-run 与逐步调试)
- **调度器** — 以 interval 或 cron 表达式执行脚本,两类调度可同时存在
- **全局热键** — OS 热键绑定到 action 脚本(当前支持 WindowsmacOS/Linux 保留扩展接口)
- **全局热键** — 跨平台绑定 OS 热键到 action 脚本Windows (`RegisterHotKey`)、macOS (`CGEventTap`,需 Accessibility 权限)、Linux X11 (`XGrabKey`,含 NumLock / CapsLock 变体掩码)。Wayland 不支持。三个平台共享同一个 API;`backends/` 在 `start()` 时自动挑后端
- **事件触发器** — 检测到图像出现、窗口出现、像素变化或文件变动时自动执行脚本
- **执行历史** — 使用 SQLite 记录 scheduler / triggers / hotkeys / REST 的执行结果;错误时自动附带截图
- **报告生成** — 将测试记录导出为 HTML、JSON 或 XML 报告,包含成功/失败状态
Expand Down Expand Up @@ -949,9 +993,10 @@ ac.default_scheduler.start()

### 全局热键

将 OS 热键绑定到 action JSON 脚本(Windows 后端;macOS / Linux 的
`start()` 目前会抛出 `NotImplementedError`,接口已按 Strategy pattern
保留)。
将 OS 热键绑定到 action JSON 脚本。跨平台 — Windows 用
`RegisterHotKey`、macOS 用 `CGEventTap`(需要 Accessibility 权限)、
Linux X11 用 `XGrabKey`(不支持 Wayland)。三个平台同一个 API;daemon
在 `start()` 时自动挑后端。

```python
from je_auto_control import default_hotkey_daemon
Expand Down
Loading
Loading