|
13 | 13 |
|
14 | 14 | ## Table of Contents |
15 | 15 |
|
| 16 | +- [What's new (2026-05)](#whats-new-2026-05) |
16 | 17 | - [Features](#features) |
17 | 18 | - [Architecture](#architecture) |
18 | 19 | - [Installation](#installation) |
|
54 | 55 |
|
55 | 56 | --- |
56 | 57 |
|
| 58 | +## What's new (2026-05) |
| 59 | + |
| 60 | +Twenty-three additions covering smarter locators, deeper IDE / ops |
| 61 | +tooling, two new platforms, and fresh integrations. Each ships with a |
| 62 | +headless API, an `AC_*` executor command, an `ac_*` MCP tool, and |
| 63 | +(where it makes sense) a Qt GUI tab. Full reference page: |
| 64 | +[`docs/source/Eng/doc/new_features/v2_features_doc.rst`](docs/source/Eng/doc/new_features/v2_features_doc.rst). |
| 65 | + |
| 66 | +**Locator + selector intelligence** |
| 67 | +- **Self-healing locator** — `image_template → VLM` fallback with a JSON-lines audit log (`AC_self_heal_locate / _click`). |
| 68 | +- **Anchor-based locator** — find element B by spatial relation (`above`, `below`, `left_of`, `right_of`, `near`) to anchor A; anchor and target can use different backends (image / OCR / VLM / a11y). |
| 69 | +- **OCR with structured output** — cluster raw OCR matches into rows, tables, and `label:value` form fields (`AC_ocr_read_structure`). |
| 70 | +- **Smart waits** — `wait_until_screen_stable`, `wait_until_pixel_changes`, `wait_until_region_idle`: frame-diff replacements for `time.sleep`. |
| 71 | +- **A/B locator framework** — race N strategies for the same target; recommend the historically best one from a persisted ledger. |
| 72 | + |
| 73 | +**Operations + observability** |
| 74 | +- **LLM cost telemetry** — per-call token + USD log with day / model / provider rollup (`record_llm_call`, `summarise_llm_costs`). |
| 75 | +- **Trace replay UI** — scrubbable timeline over the existing time-travel recordings with per-step action list. |
| 76 | +- **Failure → ticket automation** — fan a failure report out to Jira / Linear / GitHub Issues when a scheduled / triggered / REST run fails. |
| 77 | +- **Container CI templates** — GitHub Actions + GitLab CI workflows that build the image, run the headless pytest suite under Xvfb, and smoke-test the REST entrypoint; XFCE+x11vnc Dockerfile variant for flows that need a real WM. |
| 78 | +- **Cross-host DAG orchestrator** — parallel execution with skip-on-failure cascade across local + admin-console-registered hosts (`run_dag`, `AC_run_dag`). |
| 79 | +- **Multi-viewer presence** — roster + controller/observer roles for the remote desktop, with a thread-safe Python `PresenceRegistry` independent of aiortc. |
| 80 | + |
| 81 | +**Agent + integrations** |
| 82 | +- **Computer-use high-level API** — `run_computer_use(goal, ...)` wraps `ComputerUseAgentBackend` + `AgentLoop`; auto-detects display size; bounded by `max_steps` / `wall_seconds`. |
| 83 | +- **WebRunner convenience commands** — `web_open` / `web_quit` / `web_screenshot` / `web_current_url` on top of the existing `je_web_runner` bridge; same surface exposed as `AC_web_*` and `ac_web_*`. |
| 84 | +- **Chat-ops bot** — transport-agnostic `CommandRouter` + polling Slack adapter. Built-in commands: `/help`, `/scripts`, `/run`, `/screenshot`, `/status`. RBAC via `required_role`. |
| 85 | + |
| 86 | +**Platform coverage** |
| 87 | +- **Wayland CLI backend** — `wtype` / `ydotool` / `grim` with `XDG_SESSION_TYPE` auto-detect and X11 (XWayland) fallback; override via `JE_AUTOCONTROL_LINUX_DISPLAY_SERVER=x11|wayland|auto`. |
| 88 | +- **Wayland libei native** — ctypes binding to `libei.so.*` for microsecond-latency input; opt-in via `JE_AUTOCONTROL_WAYLAND_INPUT_BACKEND=libei|cli|auto`. Defaults to libei when loadable. |
| 89 | +- **macOS Accessibility deep-dive** — recursive `dump_accessibility_tree()` plus a polling `AccessibilityRecorder` for focus / bounds events. |
| 90 | + |
| 91 | +**Developer experience** |
| 92 | +- **autocontrol-lsp completion** — the language server now tracks `didOpen` / `didChange` / `didClose`, publishes diagnostics for invalid JSON and unknown `AC_*` commands, and provides signature help generated from the live executor table. |
| 93 | +- **`.pyi` stub generator** — `python -m je_auto_control.utils.stubs.generator je_auto_control/actions.pyi` emits an IDE-facing stub so every `AC_*` command autocompletes with parameter hints. |
| 94 | +- **VS Code extension** — bundled extension now ships `AutoControl: Run / Screenshot / Preview` commands that hit the local REST API. |
| 95 | +- **Browser extension recorder** — Manifest V3 extension under `browser-extension/`: capture clicks, typing, navigation, form submissions in a tab and export them as `AC_web_*` / `WR_*` JSON. |
| 96 | +- **pytest plugin + Gherkin BDD** — `pytest11` entry point auto-loads; `@pytest.mark.autocontrol` arms screenshot-on-failure; `bdd_steps.register_pytest_bdd_steps(pytest_bdd)` wires `Given/When/Then` onto every `AC_*` verb. |
| 97 | +- **Visual flow editor** — node-based view that round-trips to the same JSON action format the list-based Script Builder uses. |
| 98 | + |
| 99 | +--- |
| 100 | + |
57 | 101 | ## Features |
58 | 102 |
|
59 | 103 | - **Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control |
|
71 | 115 | - **Action Recording & Playback** — record mouse/keyboard events and replay them |
72 | 116 | - **JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug) |
73 | 117 | - **Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts |
74 | | -- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts on all three desktops: Windows (`RegisterHotKey`), macOS (`CGEventTap`, needs Accessibility permission), and Linux X11 (`XGrabKey` with NumLock / CapsLock variant masking). Wayland is not supported. Same `bind()` / `start()` API across platforms; the Strategy-pattern dispatch in `backends/` auto-picks the right backend at start time |
| 118 | +- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts on all three desktops: Windows (`RegisterHotKey`), macOS (`CGEventTap`, needs Accessibility permission), and Linux X11 (`XGrabKey` with NumLock / CapsLock variant masking). Wayland hotkeys are still compositor-dependent (each session bus exposes a different shortcut portal); a Wayland session can still drive AutoControl via the new Wayland input backend (see [What's new (2026-05)](#whats-new-2026-05)). Same `bind()` / `start()` API across platforms; the Strategy-pattern dispatch in `backends/` auto-picks the right backend at start time |
75 | 119 | - **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified |
76 | 120 | - **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts |
77 | 121 | - **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status |
|
0 commit comments