Skip to content

Commit ddf62a5

Browse files
committed
Document the 23 new features in READMEs + Sphinx docs
* Add "What's new (2026-05)" sections to README.md, README/README_zh-TW.md, README/README_zh-CN.md grouped by Locator / Operations / Agent / Platform / Developer-Experience, with TOC entries. * New Sphinx page docs/source/Eng/doc/new_features/v2_features_doc.rst documenting each feature with usage examples, executor commands, MCP tool names, and GUI tab references. * Mirrored at docs/source/Zh/doc/new_features/v2_features_doc.rst. * Wired both pages into eng_index.rst / zh_index.rst toctrees. * Updated the stale "Wayland is not supported" line in the Hotkey Daemon bullet to point at the new Wayland input backend.
1 parent caf6514 commit ddf62a5

7 files changed

Lines changed: 829 additions & 1 deletion

File tree

README.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
## Table of Contents
1515

16+
- [What's new (2026-05)](#whats-new-2026-05)
1617
- [Features](#features)
1718
- [Architecture](#architecture)
1819
- [Installation](#installation)
@@ -54,6 +55,49 @@
5455

5556
---
5657

58+
## What's new (2026-05)
59+
60+
Twenty-three additions covering smarter locators, deeper IDE / ops
61+
tooling, two new platforms, and fresh integrations. Each ships with a
62+
headless API, an `AC_*` executor command, an `ac_*` MCP tool, and
63+
(where it makes sense) a Qt GUI tab. Full reference page:
64+
[`docs/source/Eng/doc/new_features/v2_features_doc.rst`](docs/source/Eng/doc/new_features/v2_features_doc.rst).
65+
66+
**Locator + selector intelligence**
67+
- **Self-healing locator**`image_template → VLM` fallback with a JSON-lines audit log (`AC_self_heal_locate / _click`).
68+
- **Anchor-based locator** — find element B by spatial relation (`above`, `below`, `left_of`, `right_of`, `near`) to anchor A; anchor and target can use different backends (image / OCR / VLM / a11y).
69+
- **OCR with structured output** — cluster raw OCR matches into rows, tables, and `label:value` form fields (`AC_ocr_read_structure`).
70+
- **Smart waits**`wait_until_screen_stable`, `wait_until_pixel_changes`, `wait_until_region_idle`: frame-diff replacements for `time.sleep`.
71+
- **A/B locator framework** — race N strategies for the same target; recommend the historically best one from a persisted ledger.
72+
73+
**Operations + observability**
74+
- **LLM cost telemetry** — per-call token + USD log with day / model / provider rollup (`record_llm_call`, `summarise_llm_costs`).
75+
- **Trace replay UI** — scrubbable timeline over the existing time-travel recordings with per-step action list.
76+
- **Failure → ticket automation** — fan a failure report out to Jira / Linear / GitHub Issues when a scheduled / triggered / REST run fails.
77+
- **Container CI templates** — GitHub Actions + GitLab CI workflows that build the image, run the headless pytest suite under Xvfb, and smoke-test the REST entrypoint; XFCE+x11vnc Dockerfile variant for flows that need a real WM.
78+
- **Cross-host DAG orchestrator** — parallel execution with skip-on-failure cascade across local + admin-console-registered hosts (`run_dag`, `AC_run_dag`).
79+
- **Multi-viewer presence** — roster + controller/observer roles for the remote desktop, with a thread-safe Python `PresenceRegistry` independent of aiortc.
80+
81+
**Agent + integrations**
82+
- **Computer-use high-level API**`run_computer_use(goal, ...)` wraps `ComputerUseAgentBackend` + `AgentLoop`; auto-detects display size; bounded by `max_steps` / `wall_seconds`.
83+
- **WebRunner convenience commands**`web_open` / `web_quit` / `web_screenshot` / `web_current_url` on top of the existing `je_web_runner` bridge; same surface exposed as `AC_web_*` and `ac_web_*`.
84+
- **Chat-ops bot** — transport-agnostic `CommandRouter` + polling Slack adapter. Built-in commands: `/help`, `/scripts`, `/run`, `/screenshot`, `/status`. RBAC via `required_role`.
85+
86+
**Platform coverage**
87+
- **Wayland CLI backend**`wtype` / `ydotool` / `grim` with `XDG_SESSION_TYPE` auto-detect and X11 (XWayland) fallback; override via `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto`.
88+
- **Wayland libei native** — ctypes binding to `libei.so.*` for microsecond-latency input; opt-in via `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto`. Defaults to libei when loadable.
89+
- **macOS Accessibility deep-dive** — recursive `dump_accessibility_tree()` plus a polling `AccessibilityRecorder` for focus / bounds events.
90+
91+
**Developer experience**
92+
- **autocontrol-lsp completion** — the language server now tracks `didOpen` / `didChange` / `didClose`, publishes diagnostics for invalid JSON and unknown `AC_*` commands, and provides signature help generated from the live executor table.
93+
- **`.pyi` stub generator**`python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` emits an IDE-facing stub so every `AC_*` command autocompletes with parameter hints.
94+
- **VS Code extension** — bundled extension now ships `AutoControl: Run / Screenshot / Preview` commands that hit the local REST API.
95+
- **Browser extension recorder** — Manifest V3 extension under `browser-extension/`: capture clicks, typing, navigation, form submissions in a tab and export them as `AC_web_*` / `WR_*` JSON.
96+
- **pytest plugin + Gherkin BDD**`pytest11` entry point auto-loads; `@pytest.mark.autocontrol` arms screenshot-on-failure; `bdd_steps.register_pytest_bdd_steps(pytest_bdd)` wires `Given/When/Then` onto every `AC_*` verb.
97+
- **Visual flow editor** — node-based view that round-trips to the same JSON action format the list-based Script Builder uses.
98+
99+
---
100+
57101
## Features
58102

59103
- **Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control
@@ -71,7 +115,7 @@
71115
- **Action Recording & Playback** — record mouse/keyboard events and replay them
72116
- **JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug)
73117
- **Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts
74-
- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts on all three desktops: Windows (`RegisterHotKey`), macOS (`CGEventTap`, needs Accessibility permission), and Linux X11 (`XGrabKey` with NumLock / CapsLock variant masking). Wayland is not supported. Same `bind()` / `start()` API across platforms; the Strategy-pattern dispatch in `backends/` auto-picks the right backend at start time
118+
- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts on all three desktops: Windows (`RegisterHotKey`), macOS (`CGEventTap`, needs Accessibility permission), and Linux X11 (`XGrabKey` with NumLock / CapsLock variant masking). Wayland hotkeys are still compositor-dependent (each session bus exposes a different shortcut portal); a Wayland session can still drive AutoControl via the new Wayland input backend (see [What's new (2026-05)](#whats-new-2026-05)). Same `bind()` / `start()` API across platforms; the Strategy-pattern dispatch in `backends/` auto-picks the right backend at start time
75119
- **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
76120
- **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
77121
- **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status

README/README_zh-CN.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
## 目录
1414

15+
- [本次更新 (2026-05)](#本次更新-2026-05)
1516
- [功能特性](#功能特性)
1617
- [架构](#架构)
1718
- [安装](#安装)
@@ -53,6 +54,49 @@
5354

5455
---
5556

57+
## 本次更新 (2026-05)
58+
59+
新增 23 个功能,覆盖更聪明的定位器、更深的 IDE / 运维工具、两个新平台后端,
60+
以及几个新集成。每个功能都遵循框架既有模式:headless Python API、
61+
`AC_*` executor 命令、`ac_*` MCP 工具,以及(适用时)Qt GUI 选项卡。
62+
完整参考页面:
63+
[`docs/source/Zh/doc/new_features/v2_features_doc.rst`](../docs/source/Zh/doc/new_features/v2_features_doc.rst)
64+
65+
**定位器与选择器智能化**
66+
- **自愈定位器**`image_template → VLM` 后备并写入 JSON-lines 审计记录(`AC_self_heal_locate / _click`)。
67+
- **锚点定位器** — 按空间关系(`above` / `below` / `left_of` / `right_of` / `near`)找到目标;锚点与目标可使用不同 backend(image / OCR / VLM / a11y)。
68+
- **结构化 OCR** — 将原始 OCR match 聚合为 rows、tables、`label:value` 表单字段(`AC_ocr_read_structure`)。
69+
- **智能等待**`wait_until_screen_stable``wait_until_pixel_changes``wait_until_region_idle`:用 frame-diff 取代 `time.sleep`
70+
- **A/B 定位器框架** — 并行跑 N 个策略,依持久化的历史成绩推荐最佳。
71+
72+
**运维与可观测性**
73+
- **LLM 成本遥测** — 每次调用的 token / USD 记录,按天 / 模型 / 提供方汇总(`record_llm_call``summarise_llm_costs`)。
74+
- **追踪回放 UI** — 在现有 time-travel 录像上拖动时间轴并逐步显示动作。
75+
- **失败 → 工单自动化** — 调度器/触发器/REST 任务失败时自动分发 Jira / Linear / GitHub Issues。
76+
- **容器化 CI 模板** — GitHub Actions + GitLab CI workflow:构建镜像、跑 headless pytest(Xvfb 容器内)、smoke-test REST entrypoint;另含 XFCE+x11vnc Dockerfile 变体。
77+
- **跨主机 DAG 编排** — 跨 local + admin-console 已注册主机并行执行,失败时下游 cascade 为 `skipped``run_dag``AC_run_dag`)。
78+
- **多 viewer 名单** — 为远程桌面提供控制者 / 观察者角色,纯 Python `PresenceRegistry` 独立于 aiortc。
79+
80+
**代理与集成**
81+
- **Computer-use 高阶 API**`run_computer_use(goal, ...)` 封装 `ComputerUseAgentBackend` + `AgentLoop`;自动检测屏幕大小;以 `max_steps` / `wall_seconds` 为预算。
82+
- **WebRunner 便利命令** — 在既有 `je_web_runner` 桥接之上的 `web_open` / `web_quit` / `web_screenshot` / `web_current_url`;同步以 `AC_web_*``ac_web_*` 暴露。
83+
- **Chat-ops 机器人** — 传输层中立的 `CommandRouter` + Slack polling adapter。内置命令:`/help``/scripts``/run``/screenshot``/status`。RBAC 通过 `required_role`
84+
85+
**平台覆盖**
86+
- **Wayland CLI 后端**`wtype` / `ydotool` / `grim`,按 `XDG_SESSION_TYPE` 自动检测,CLI 工具未装时回退到 X11 (XWayland);可用 `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto` 覆盖。
87+
- **Wayland libei 原生后端** — 对 `libei.so.*` 的 ctypes 绑定,绕过 CLI shim 取得微秒级延迟;以 `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto` 启用,默认在 libei 可加载时用 libei。
88+
- **macOS Accessibility 强化** — 递归 `dump_accessibility_tree()` 与 polling `AccessibilityRecorder`,捕捉 focus / bounds 事件。
89+
90+
**开发者体验**
91+
- **autocontrol-lsp 完整化** — 追踪 `didOpen` / `didChange` / `didClose`、发布 JSON 与未知 `AC_*` 命令的 diagnostics、由即时的 executor 表生成 signature help。
92+
- **`.pyi` stub 生成器**`python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` 写出 IDE 端 stub 文件,所有 `AC_*` 命令在 IDE 内可 autocomplete 并显示参数提示。
93+
- **VS Code 扩展** — 内置扩展新增 `AutoControl: Run / Screenshot / Preview` 命令,直接打本机 REST API。
94+
- **浏览器扩展录制器**`browser-extension/` 下的 Manifest V3 扩展:捕捉标签页的点击、输入、导航与表单提交,导出为 `AC_web_*` / `WR_*` JSON。
95+
- **pytest plugin + Gherkin BDD**`pytest11` entry point 自动加载;`@pytest.mark.autocontrol` 开启失败自动截屏;`bdd_steps.register_pytest_bdd_steps(pytest_bdd)` 一次把 `Given/When/Then` 对应到每一个 `AC_*` verb。
96+
- **可视化流程编辑器** — node-based 视图与既有 list-based Script Builder 使用同一份 JSON 格式,互相兼容。
97+
98+
---
99+
56100
## 功能特性
57101

58102
- **鼠标自动化** — 移动、点击、按下、释放、拖拽、滚动,支持精确坐标控制

README/README_zh-TW.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
## 目錄
1414

15+
- [本次更新 (2026-05)](#本次更新-2026-05)
1516
- [功能特色](#功能特色)
1617
- [架構](#架構)
1718
- [安裝](#安裝)
@@ -53,6 +54,49 @@
5354

5455
---
5556

57+
## 本次更新 (2026-05)
58+
59+
新增 23 個功能,涵蓋更聰明的定位器、更深的 IDE / 維運工具、兩個新平台後端,
60+
以及幾個新整合。每個功能都遵循框架既有模式:headless Python API、
61+
`AC_*` executor 命令、`ac_*` MCP 工具,以及(適用時)Qt GUI 分頁。
62+
完整參考頁面:
63+
[`docs/source/Zh/doc/new_features/v2_features_doc.rst`](../docs/source/Zh/doc/new_features/v2_features_doc.rst)
64+
65+
**定位器與選擇器智慧化**
66+
- **自我修復定位器**`image_template → VLM` 後備並寫入 JSON-lines 稽核記錄(`AC_self_heal_locate / _click`)。
67+
- **錨點定位器** — 依空間關係(`above` / `below` / `left_of` / `right_of` / `near`)找到目標;錨點與目標可使用不同 backend(image / OCR / VLM / a11y)。
68+
- **結構化 OCR** — 把原始 OCR match 聚合為 rows、tables、`label:value` 表單欄位(`AC_ocr_read_structure`)。
69+
- **智慧等待**`wait_until_screen_stable``wait_until_pixel_changes``wait_until_region_idle`:用 frame-diff 取代 `time.sleep`
70+
- **A/B 定位器框架** — 並行跑 N 個策略,依持久化的歷史成績推薦最佳。
71+
72+
**維運與觀察性**
73+
- **LLM 成本遙測** — 每次呼叫的 token / USD 紀錄,按天 / 模型 / 提供者彙總(`record_llm_call``summarise_llm_costs`)。
74+
- **追蹤重播 UI** — 在現有 time-travel 錄影上拖曳時間軸並逐步顯示動作。
75+
- **失敗 → 工單自動化** — 排程/觸發器/REST 任務失敗時自動分送 Jira / Linear / GitHub Issues。
76+
- **容器化 CI 模板** — GitHub Actions + GitLab CI workflow:建鏡像、跑 headless pytest(Xvfb 容器內)、smoke-test REST entrypoint;另含 XFCE+x11vnc Dockerfile 變體。
77+
- **跨主機 DAG 編排** — 跨 local + admin-console 已註冊主機並行執行,失敗時下游 cascade 為 `skipped``run_dag``AC_run_dag`)。
78+
- **多 viewer 名單** — 為遠端桌面提供控制者 / 觀察者角色,純 Python `PresenceRegistry` 獨立於 aiortc。
79+
80+
**代理與整合**
81+
- **Computer-use 高階 API**`run_computer_use(goal, ...)` 封裝 `ComputerUseAgentBackend` + `AgentLoop`;自動偵測螢幕大小;以 `max_steps` / `wall_seconds` 為預算。
82+
- **WebRunner 便利命令** — 在既有 `je_web_runner` 橋接之上的 `web_open` / `web_quit` / `web_screenshot` / `web_current_url`;同步以 `AC_web_*``ac_web_*` 暴露。
83+
- **Chat-ops 機器人** — 傳輸層中立的 `CommandRouter` + Slack polling adapter。內建命令:`/help``/scripts``/run``/screenshot``/status`。RBAC 透過 `required_role`
84+
85+
**平台覆蓋**
86+
- **Wayland CLI 後端**`wtype` / `ydotool` / `grim`,依 `XDG_SESSION_TYPE` 自動偵測,CLI 工具未裝時回退到 X11 (XWayland);可用 `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto` 覆寫。
87+
- **Wayland libei 原生後端** — 對 `libei.so.*` 的 ctypes 綁定,繞過 CLI shim 取得微秒級延遲;以 `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto` 啟用,預設在 libei 可載入時用 libei。
88+
- **macOS Accessibility 強化** — 遞迴 `dump_accessibility_tree()` 與 polling `AccessibilityRecorder`,捕捉 focus / bounds 事件。
89+
90+
**開發者體驗**
91+
- **autocontrol-lsp 完整化** — 追蹤 `didOpen` / `didChange` / `didClose`、發佈 JSON 與未知 `AC_*` 命令的 diagnostics、由即時的 executor 表產生 signature help。
92+
- **`.pyi` stub 產生器**`python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` 寫出 IDE 端 stub 檔,所有 `AC_*` 命令在 IDE 內可 autocomplete 並顯示參數提示。
93+
- **VS Code 擴充** — 內建擴充新增 `AutoControl: Run / Screenshot / Preview` 命令,直接打本機 REST API。
94+
- **瀏覽器擴充錄製器**`browser-extension/` 下的 Manifest V3 擴充:捕捉分頁的點擊、輸入、導航與表單提交,匯出成 `AC_web_*` / `WR_*` JSON。
95+
- **pytest plugin + Gherkin BDD**`pytest11` entry point 自動載入;`@pytest.mark.autocontrol` 開啟失敗自動截圖;`bdd_steps.register_pytest_bdd_steps(pytest_bdd)` 一次把 `Given/When/Then` 對應到每一個 `AC_*` verb。
96+
- **視覺流程編輯器** — node-based 視圖與既有 list-based Script Builder 使用同一份 JSON 格式,互相相容。
97+
98+
---
99+
56100
## 功能特色
57101

58102
- **滑鼠自動化** — 移動、點擊、按下、釋放、拖曳、滾動,支援精確座標控制

0 commit comments

Comments
 (0)