Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion docs/specs/ops-security-hardening/decisions.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Decisions — Ops Dashboard Security Hardening

**Status:** draft
**Status:** complete (2026-05-12, pending Patrick's Phase 5 smoke)
**Owner:** Patrick
**Opened:** 2026-05-11
**Closed:** 2026-05-12 — implementation landed via v6.7.1 (#254, #256) + Phase 2.3 test added today
**Trigger:** Code-review report on PR #251 (2026-05-11) flagged the ops dashboard's command-execution endpoints as vulnerable to DNS-rebinding attacks. Verified real. Not introduced by #251 — pre-existing since the ops runner shipped.

---
Expand Down Expand Up @@ -87,3 +88,50 @@ Spec closes when:
5. End-to-end test: start a run, complete it, GET `/runs/<id>/view`, parse the stream URL from the rendered HTML, connect, assert full log replays.
6. Manual DNS-rebinding repro fails (e.g., `curl -H "Host: evil.com:8766" http://localhost:8766/api/info` returns 400).
7. CI green on all 12 platform lanes.

---

## 2026-05-12 — Spec closed

All implementation work landed across three PRs during the May 11–12
window:

| PR | Title | What landed |
|----|-------|-------------|
| #254 | feat(ops): security hardening — DNS-rebinding fix + cluster | Middleware, Config field, CLI flag, startup warning, queue bound, dashboard logging, e2e test |
| #256 | release: v6.7.1 — DNS-rebinding fix for ops dashboard | Cut release; CHANGELOG entry satisfies Phase 6.4 |
| (today) | Phase 2.3 missing test added | `test_subscriber_queue_does_not_block_fast_subscribers` regression guard for queue isolation |

**Resolution criteria — all satisfied in code:**

1. ✅ `TrustedHostMiddleware` mounted in `create_app()` via
`server.py:41 add_middleware(...)`.
2. ✅ `--trusted-host` repeatable CLI flag in
`cli.py:58`; threaded through `build_config()`.
3. ✅ Queue bound: `runner.py:100 asyncio.Queue(maxsize=_SUBSCRIBER_QUEUE_MAXSIZE)`
with two tests now covering both drop-on-overflow and fast-subscriber
isolation.
4. ✅ `run_view_page` logs 404 at INFO (`dashboard.py:118`) and 400 at
WARN (`dashboard.py:110`).
5. ✅ E2E test: `test_run_view_replays_full_output_after_completion`
in `tests/unit/ops/test_runner.py:280`.
6. 🟡 Phase 5 smoke tests (Patrick's manual run, interactive).
7. ✅ CI green: all 12 platform lanes pass; 142 ops tests in
`tests/unit/ops/`.

**Spec is closeable upon Patrick's Phase 5 smoke pass.** All
remaining work is the manual verification step (`curl` checks +
browser-load).

### What's left as a follow-up (not blocking)

Per the spec's own out-of-scope list — these were always deferred,
not regressions:

- HTTPS / TLS for the dashboard
- Auth tokens / OAuth
- Per-workflow capability gating
- Audit log of all runs
- Subprocess sandboxing

Open follow-up specs only when one becomes a real ask.
2 changes: 1 addition & 1 deletion docs/specs/ops-security-hardening/design.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Design — Ops Dashboard Security Hardening

**Status:** draft
**Status:** complete (2026-05-12, pending Phase 5 smoke)

Technical shape. See `decisions.md` for context, `requirements.md` for stories, `tasks.md` for the phase plan.

Expand Down
2 changes: 1 addition & 1 deletion docs/specs/ops-security-hardening/requirements.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Requirements — Ops Dashboard Security Hardening

**Status:** draft
**Status:** complete (2026-05-12, pending Phase 5 smoke)

See `decisions.md` for context, `design.md` for the implementation shape, `tasks.md` for the phase plan.

Expand Down
36 changes: 18 additions & 18 deletions docs/specs/ops-security-hardening/tasks.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tasks — Ops Dashboard Security Hardening

**Status:** draft
**Status:** complete (2026-05-12, pending Phase 5 smoke)

Single-phase implementation (small enough to ship in one PR) plus a verification phase. See `decisions.md`, `requirements.md`, `design.md` for context.

Expand All @@ -10,12 +10,12 @@ Single-phase implementation (small enough to ship in one PR) plus a verification

Goal: DNS-rebinding attacks fail at the middleware layer.

- [ ] **1.1** Add `trusted_hosts: tuple[str, ...] = ()` to `Config` dataclass. Add `trusted_hosts` param to `build_config()`.
- [ ] **1.2** Add `--trusted-host` repeatable CLI flag to `cli.py`. Thread through to `build_config()`.
- [ ] **1.3** Add startup warning for `--host 0.0.0.0` without `--trusted-host` (stderr, yellow if TTY).
- [ ] **1.4** Create `src/attune/ops/middleware.py` with `TrustedHostMiddleware` and `compute_allowlist()`.
- [ ] **1.5** Mount the middleware FIRST in `create_app()` (before any other middleware).
- [ ] **1.6** Tests:
- [x] **1.1** Add `trusted_hosts: tuple[str, ...] = ()` to `Config` dataclass. Add `trusted_hosts` param to `build_config()`.
- [x] **1.2** Add `--trusted-host` repeatable CLI flag to `cli.py`. Thread through to `build_config()`.
- [x] **1.3** Add startup warning for `--host 0.0.0.0` without `--trusted-host` (stderr, yellow if TTY).
- [x] **1.4** Create `src/attune/ops/middleware.py` with `TrustedHostMiddleware` and `compute_allowlist()`.
- [x] **1.5** Mount the middleware FIRST in `create_app()` (before any other middleware).
- [x] **1.6** Tests:
- `test_middleware_allows_loopback` — `Host: localhost:<port>` passes
- `test_middleware_allows_127_0_0_1` — `Host: 127.0.0.1:<port>` passes
- `test_middleware_rejects_unknown_host` — `Host: evil.com:8766` returns 400 with the expected detail
Expand All @@ -28,29 +28,29 @@ Goal: DNS-rebinding attacks fail at the middleware layer.

## Phase 2 — Queue bound

- [ ] **2.1** One-line change in `runner.py`: `asyncio.Queue(maxsize=1000)`.
- [ ] **2.2** Test: `test_subscriber_queue_drops_slow_subscriber` — push 1001+ events to a non-consuming subscriber, assert it's removed from `self.subscribers` after the QueueFull path fires.
- [ ] **2.3** Test: `test_subscriber_queue_does_not_block_fast_subscribers` — one slow subscriber doesn't slow down a fast one.
- [x] **2.1** One-line change in `runner.py`: `asyncio.Queue(maxsize=1000)`.
- [x] **2.2** Test: `test_subscriber_queue_drops_slow_subscriber` — push 1001+ events to a non-consuming subscriber, assert it's removed from `self.subscribers` after the QueueFull path fires.
- [x] **2.3** Test: `test_subscriber_queue_does_not_block_fast_subscribers` — one slow subscriber doesn't slow down a fast one.

## Phase 3 — Run-view logging

- [ ] **3.1** Add `logger = logging.getLogger(__name__)` to `dashboard.py` if not present.
- [ ] **3.2** Log at INFO on 404 path of `run_view_page` with `run_id` context.
- [ ] **3.3** Log at WARN on invalid-run_id path with the (truncated) offending input.
- [ ] **3.4** Test: `test_run_view_logs_404` — caplog captures the INFO line with run_id.
- [ ] **3.5** Test: `test_run_view_logs_invalid_input` — caplog captures the WARN line with the bad input.
- [x] **3.1** Add `logger = logging.getLogger(__name__)` to `dashboard.py` if not present.
- [x] **3.2** Log at INFO on 404 path of `run_view_page` with `run_id` context.
- [x] **3.3** Log at WARN on invalid-run_id path with the (truncated) offending input.
- [x] **3.4** Test: `test_run_view_logs_404` — caplog captures the INFO line with run_id.
- [x] **3.5** Test: `test_run_view_logs_invalid_input` — caplog captures the WARN line with the bad input.

## Phase 4 — End-to-end "output survives refresh" test

- [ ] **4.1** Add `test_run_view_replays_full_output_after_completion` in `tests/unit/ops/test_runner.py`. Pattern:
- [x] **4.1** Add `test_run_view_replays_full_output_after_completion` in `tests/unit/ops/test_runner.py`. Pattern:
```python
# Start a run, wait for terminal, request /runs/<id>/view,
# parse the stream_url out of the rendered HTML, attach to the
# SSE stream, assert the replayed lines match the expected
# output, and assert we get the terminal `done` event.
```
- [ ] **4.2** Test must run async (the existing `_wait_terminal` helper expects it).
- [ ] **4.3** Run the existing ops suite to confirm no regressions: `pytest tests/unit/ops/ --no-cov`.
- [x] **4.2** Test must run async (the existing `_wait_terminal` helper expects it).
- [x] **4.3** Run the existing ops suite to confirm no regressions: `pytest tests/unit/ops/ --no-cov`.

## Phase 5 — Manual verification

Expand Down
29 changes: 29 additions & 0 deletions tests/unit/ops/test_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,3 +382,32 @@ async def test_subscriber_queue_drops_slow_subscriber(tmp_path, monkeypatch):
assert (
slow_queue not in run.subscribers
), "slow subscriber should have been dropped after queue filled up"


async def test_subscriber_queue_does_not_block_fast_subscribers(tmp_path, monkeypatch):
# Regression guard: one slow subscriber must not slow down others.
# _broadcast is synchronous put_nowait per subscriber, so a backed-up
# queue triggers QueueFull on that subscriber only — the rest still
# receive the event in the same loop iteration.
import asyncio

from attune.ops.runner import Run

run = Run(id="test-isolation", workflow="echo")
# Slow subscriber — saturates after 3 events, is then dropped.
slow_queue: asyncio.Queue = asyncio.Queue(maxsize=3)
# Fast subscriber — well above the event count we'll send.
fast_queue: asyncio.Queue = asyncio.Queue(maxsize=100)
run.subscribers.add(slow_queue)
run.subscribers.add(fast_queue)

for i in range(10):
run._broadcast(("line", f"event-{i}"))

# Slow subscriber gets dropped per its own QueueFull, no impact on fast.
assert slow_queue not in run.subscribers
assert fast_queue in run.subscribers
# Fast subscriber received every event in order.
assert fast_queue.qsize() == 10
received = [fast_queue.get_nowait() for _ in range(10)]
assert received == [("line", f"event-{i}") for i in range(10)]
Loading