Skip to content

fix(cli): preserve entrypoint and patch surfaces#4190

Closed
rysweet wants to merge 6 commits intomainfrom
feat/orch-5-timeout-stack-recovery-cluster
Closed

fix(cli): preserve entrypoint and patch surfaces#4190
rysweet wants to merge 6 commits intomainfrom
feat/orch-5-timeout-stack-recovery-cluster

Conversation

@rysweet
Copy link
Copy Markdown
Owner

@rysweet rysweet commented Apr 3, 2026

What this PR does

Brings the live #4190 diff to an honest merge-ready state around the actual CLI package surface that remains on this branch.

The earlier timeout-lifecycle work is already largely on main; this PR now accurately focuses on the code that still differs from main:

  • src/amplihack/cli/__main__.py
  • src/amplihack/cli/__init__.py
  • src/amplihack/cli.py
  • tests/gadugi/pr-4190-cli-entrypoint-and-forwarding.yaml
  • tests/unit/test_auto_mode_timeout.py
  • the existing atlas docs files already in the branch diff

Why these changes were needed

The branch was CI-green but not honestly merge-ready.

Two real CLI/package-surface issues remained:

  • python -m amplihack.cli called main() directly and dropped non-zero exit codes.
  • handle_auto_mode() and the package forwarding surface were inconsistent, so mock.patch("amplihack.cli.AutoMode") did not reliably intercept the real execution path in a fresh process.

This PR fixes those issues and adds outside-in coverage for the remaining live surface.

Changes in this PR

CLI behavior

  • src/amplihack/cli/__main__.py

    • now uses sys.exit(main()) so python -m amplihack.cli preserves CLI exit codes.
  • src/amplihack/cli/__init__.py

    • keeps main, add_auto_mode_args, and create_parser available from the package surface.
    • adds a lazy, patchable AutoMode package attribute so mock.patch("amplihack.cli.AutoMode") works before import in fresh processes, without eagerly importing the full auto-mode stack on every CLI import.
  • src/amplihack/cli.py

    • handle_auto_mode() now uses the package-level amplihack.cli.AutoMode surface rather than bypassing it with a direct import path.

Outside-in QA

  • added tests/gadugi/pr-4190-cli-entrypoint-and-forwarding.yaml covering:
    • python -m amplihack.cli --help
    • non-zero no-args exit propagation
    • package export availability for create_parser / add_auto_mode_args
    • mock.patch("amplihack.cli.create_parser")
    • fresh-process mock.patch("amplihack.cli.AutoMode")
    • handle_auto_mode() interception through the package surface

Docs alignment

  • corrected the existing atlas note in docs/atlas/user-journeys/user-journeys.mmd so it no longer claims the timeout itself is env-configurable; the env-configurable part is the session limits.

What is not claimed here

This PR does not reintroduce soft timeout escape hatches or non-fatal dependency behavior.

It also does not claim to contain the full timeout-lifecycle implementation story; that broader work already landed on main elsewhere. This branch is now described according to its real, remaining diff.

QA / verification

Outside-in

npx gadugi-test validate -f tests/gadugi/pr-4190-cli-entrypoint-and-forwarding.yaml
npx gadugi-test run -d tests/gadugi -s pr-4190-cli-entrypoint-and-forwarding

Result:

  • scenario validates and passes
  • latest artifact:
    • outputs/sessions/session_4e5291ac-38e1-4f1c-b08f-18118e0209c0_2026-04-03T16-31-14-731Z.json

Targeted regression / type checks

uv run pytest tests/integration/test_cli_auto_mode_ultrathink.py::test_integration_auto_001_normal_prompt_prepends_ultrathink tests/integration/test_cli_auto_mode_ultrathink.py::test_integration_auto_010_auto_mode_exit_code_propagated -q
uv run pytest tests/unit/test_auto_mode_timeout.py -q
uv run pytest .claude/skills/multitask/tests/test_resumable_timeouts.py -q
uv run --with pyright pyright src/amplihack/cli.py src/amplihack/cli/__init__.py src/amplihack/cli/__main__.py

Result:

  • targeted integration checks: 2 passed
  • tests/unit/test_auto_mode_timeout.py: 10 passed
  • .claude/skills/multitask/tests/test_resumable_timeouts.py: 8 passed
  • pyright: 0 errors

PR-scoped audit

A final PR-scoped audit over the live branch surfaces found no remaining medium+ issues.

Changed files in the live diff vs main

  • docs/atlas/ast-lsp-bindings/ast-lsp-bindings.mmd
  • docs/atlas/ast-lsp-bindings/index.md
  • docs/atlas/service-components/index.md
  • docs/atlas/service-components/service-components.dot
  • docs/atlas/service-components/service-components.mmd
  • docs/atlas/user-journeys/index.md
  • docs/atlas/user-journeys/user-journeys.mmd
  • src/amplihack/cli.py
  • src/amplihack/cli/__init__.py
  • src/amplihack/cli/__main__.py
  • tests/gadugi/pr-4190-cli-entrypoint-and-forwarding.yaml
  • tests/unit/test_auto_mode_timeout.py

Merge-ready checklist

  • python -m amplihack.cli preserves exit codes
  • package-level AutoMode patching works in fresh processes
  • handle_auto_mode() uses the patchable package surface
  • gadugi scenario validates and passes
  • targeted integration checks pass
  • timeout/session-limit unit tests pass
  • resumable timeout lifecycle tests pass
  • pyright passes on touched CLI files
  • final PR-scoped audit found no medium+ issues
  • GitHub CI on commit bb2bc52e9

Merge-ready: yes

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Repo Guardian - Passed

All files changed in this PR were reviewed. No ephemeral content (point-in-time documents or temporary scripts) was found.

File Assessment
CLAUDE.md Configuration file — minor line-wrap reformatting, durable
src/amplihack/agents/goal_seeking/__pycache__/*.pyc Build artifacts — not ephemeral documents or temp scripts
src/amplihack/cli.py Source code — adds resolve_timeout, --query-timeout-minutes, --no-timeout
src/amplihack/cli/__init__.py Source code — exposes new public symbols via __getattr__
src/amplihack/cli/__main__.py Entry point — python -m amplihack.cli support

No action required.

Generated by Repo Guardian for issue #4190 ·

@rysweet rysweet force-pushed the feat/orch-5-timeout-stack-recovery-cluster branch from 796c49d to df2f62b Compare April 3, 2026 05:10
Copilot and others added 3 commits April 3, 2026 05:28
…args; fix SDK dep non-fatal; expose cli.py symbols via package __getattr__

- Add _DEFAULT_TIMEOUT_MINUTES (30.0) and _OPUS_TIMEOUT_MINUTES (60.0) constants
- Extend add_auto_mode_args() with --query-timeout-minutes and --no-timeout flags
- Implement resolve_timeout(args, model=None) with priority:
  --no-timeout > explicit timeout > opus auto-detect (60min) > default (30min)
- Wrap ensure_sdk_deps() in try/except so startup failures are non-fatal (#4067)
- Add __getattr__ to cli/__init__.py so any cli.py symbol is importable via
  'from amplihack.cli import X' without enumeration
- Add cli/__main__.py so 'python -m amplihack.cli' works

Fixes 11 failing tests in tests/unit/test_auto_mode_timeout.py.
Fixes 4 failing tests in tests/unit/cli/test_version_flag.py.
Fixes test_version_flag_exits_zero (subprocess --version now works).

Issues: #4067, #4102, #4103, #4105

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 05:10 force-push amended the atlas docs into the commit but did not
trigger the full pull_request CI suite. This empty commit causes a fresh
synchronize event so Atlas CI can verify the now-present docs/atlas/ updates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet force-pushed the feat/orch-5-timeout-stack-recovery-cluster branch from ac95f89 to 2a94204 Compare April 3, 2026 05:29
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Repo Guardian - Passed

All files changed in this PR were reviewed. No ephemeral content (point-in-time documents or temporary scripts) was found.

File Assessment
docs/atlas/ast-lsp-bindings/ast-lsp-bindings.mmd Generated architecture diagram — durable
docs/atlas/ast-lsp-bindings/index.md Generated atlas doc, timestamp metadata update — durable
docs/atlas/service-components/index.md Generated atlas doc, timestamp metadata update — durable
docs/atlas/service-components/service-components.dot Architecture diagram update for new CLI components — durable
docs/atlas/service-components/service-components.mmd Architecture diagram update — durable
docs/atlas/user-journeys/index.md Generated atlas doc, whitespace cleanup + timestamp — durable
docs/atlas/user-journeys/user-journeys.mmd New user journey for resolve_timeout — durable
src/amplihack/cli.py Source code — adds resolve_timeout, --query-timeout-minutes, --no-timeout — durable
src/amplihack/cli/__init__.py Source code — exposes new public symbols via __getattr__ — durable
src/amplihack/cli/__main__.py New entry point for python -m amplihack.cli — durable

No action required.

Generated by Repo Guardian for issue #4190 ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

PR Triage Report

Risk: 🟡 Medium | Priority: Medium | Type: Feature/fix cluster

Assessment

Adds CLI timeout management (resolve_timeout(), --query-timeout-minutes, --no-timeout) and makes SDK dep check non-fatal. Part of the timeout/stack recovery cluster (#4070, #4105, #4103, #4102, #4067). Also adds python -m amplihack.cli entrypoint and atlas refresh across 3 layers.

Key risks:

  • Modifies cli.py and cli/__init__.py (public API surface)
  • New __main__.py entrypoint added
  • Atlas layers 2, 7, 8 refreshed

CI status: Pending at time of triage — verify before merge. PR description notes CI was re-triggered on cleaned push 2a9420462.

Test coverage: 27 unit tests pass (test_auto_mode_timeout.py, test_version_flag.py). Atlas staleness check clean.

Recommendation

Needs review after CI green. Scope is well-defined and tests are present. Confirm CI passes on current push, then route for code review of the timeout priority chain logic in resolve_timeout().

Generated by PR Triage Agent ·

Revert the wrong-direction implementation from cli.py:

- Remove --query-timeout-minutes and --no-timeout CLI flags (escape hatches,
  not the real fix for the orchestrator timeout lifecycle problem)
- Remove resolve_timeout() helper (paperover, not solution)
- Remove _DEFAULT_TIMEOUT_MINUTES and _OPUS_TIMEOUT_MINUTES constants
- Restore ensure_sdk_deps() bare call in _common_launcher_startup (fatal,
  not try/except-downgraded to debug-skip)
- Remove resolve_timeout from cli/__init__.py eager binds and __all__
- Fix pyright: add assert guard on spec/loader in _load_cli_module()
- Remove wrong-direction test classes from test_auto_mode_timeout.py:
  TestOpusModelAutoDetection, TestTimeoutPriority, TestCLIArgumentParsing,
  and test_opus_detection_case_insensitive from TestEdgeCases

The real fix for #4070/#4177 (orchestrator timeout lifecycle) is the
parallel changes in .claude/skills/multitask/orchestrator.py: lifecycle
state machine (timed_out_resumable, failed_resumable, interrupted_resumable),
durable per-workstream state JSON, checkpoint-boundary resume, and the
interrupt-preserve/continue-preserve timeout policy model.

Kept from this PR:
- cli/__main__.py entrypoint (python -m amplihack.cli)
- cli/__init__.py __getattr__ forwarding (minus resolve_timeout)
- auto_mode.py env-var-configurable session limits
- All orchestrator.py lifecycle/resumability changes
- docs/features/resumable-workstream-timeouts.md
- .claude/skills/multitask/tests/test_resumable_timeouts.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet changed the title fix(cli): add resolve_timeout, --query-timeout-minutes, --no-timeout; fix SDK dep non-fatal fix(orchestrator): timeout lifecycle state machine — resumable workstream termination Apr 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Repo Guardian - Passed

All files changed in this PR were reviewed. No ephemeral content (point-in-time documents or temporary scripts) was found.

File Assessment
docs/atlas/ast-lsp-bindings/ast-lsp-bindings.mmd Generated architecture diagram with new CLI refs — durable
docs/atlas/ast-lsp-bindings/index.md Atlas layer doc, generation timestamp update — durable
docs/atlas/service-components/index.md Atlas layer doc, generation timestamp update — durable
docs/atlas/service-components/service-components.dot Architecture diagram adding new CLI components — durable
docs/atlas/service-components/service-components.mmd Architecture diagram, file count update — durable
docs/atlas/user-journeys/index.md Atlas layer doc, whitespace cleanup + timestamp — durable
docs/atlas/user-journeys/user-journeys.mmd New user journey for resolve_timeout — durable
src/amplihack/cli/__init__.py Source code — public symbol bindings + __getattr__ — durable
src/amplihack/cli/__main__.py New entry point for python -m amplihack.cli — durable
tests/unit/test_auto_mode_timeout.py Test file — removes TDD placeholder classes, keeps real tests — durable

No action required.

Generated by Repo Guardian for issue #4190 ·

…s diagrams

The resolve_timeout() function and --query-timeout-minutes/--no-timeout CLI
flags were removed as wrong-direction escape hatches. Update the service-
components and user-journeys atlas diagrams to reflect the corrected model:

- service-components.dot: remove resolve_timeout() from cli.py and
  cli/__init__.py node labels
- user-journeys.mmd: replace stale resolve_timeout sequence with the
  correct auto_mode journey (env-var-configurable session limits, not
  CLI flag escapes)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Repo Guardian - Passed

All files changed in this PR were reviewed. No ephemeral content (point-in-time documents or temporary scripts) was found.

File Assessment
docs/atlas/ast-lsp-bindings/ast-lsp-bindings.mmd Generated architecture diagram, adds new CLI refs — durable
docs/atlas/ast-lsp-bindings/index.md Atlas layer doc, generation timestamp update — durable
docs/atlas/service-components/index.md Atlas layer doc, generation timestamp update — durable
docs/atlas/service-components/service-components.dot Architecture diagram adding new CLI components — durable
docs/atlas/service-components/service-components.mmd Architecture diagram, file count update — durable
docs/atlas/user-journeys/index.md Atlas layer doc, whitespace/table cleanup + timestamp — durable
docs/atlas/user-journeys/user-journeys.mmd New user journey for auto mode timeout — durable
src/amplihack/cli/__init__.py Source code — public symbol bindings + __getattr__ forwarder — durable
src/amplihack/cli/__main__.py New python -m amplihack.cli entrypoint — durable
tests/unit/test_auto_mode_timeout.py Test file — removes TDD placeholder classes, retains real tests — durable

No action required.

Generated by Repo Guardian for issue #4190 ·

@github-actions github-actions Bot mentioned this pull request Apr 3, 2026
@rysweet rysweet mentioned this pull request Apr 3, 2026
Propagate python -m amplihack.cli exit codes, route auto mode through the patchable package surface, add a real gadugi scenario for the remaining PR 4190 CLI surface, and correct the atlas note about env-configurable session limits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Repo Guardian - Action Required

One file contains ephemeral framing that will become stale after this PR merges.


Flagged File

tests/gadugi/pr-4190-cli-entrypoint-and-forwarding.yaml

Why flagged:

  1. PR-numbered filenamepr-4190- is a point-in-time reference. Once this PR is merged, the name no longer meaningfully identifies the test.

  2. Ephemeral description field — The YAML description reads:

    "PR fix(cli): preserve entrypoint and patch surfaces #4190 currently carries the CLI package surface changes that remain after the broader timeout lifecycle work landed elsewhere"

    This describes the organizational state of a PR, not the feature under test. The phrase "currently carries" will be immediately stale after merge.

  3. Ephemeral metadata tag — The tags list includes pr-4190, which has no durable meaning.

The test steps themselves are durable — they verify permanent functionality (CLI __main__ entrypoint, create_parser/add_auto_mode_args imports, mock.patch paths). Only the framing is the issue.

Where to put the ephemeral content: The PR-context description belongs in this PR's description or a PR comment, not in the test file.

Suggested fix: Rename the file to something like cli-entrypoint-and-forwarding.yaml, replace the description with a feature-oriented summary (e.g., "Verifies the python -m amplihack.cli entrypoint and amplihack.cli forwarding surface"), and remove the pr-4190 tag.


To override, add a PR comment containing repo-guardian:override (reason) where (reason) is a required non-empty justification for allowing the file(s).

Generated by Repo Guardian for issue #4190 ·

@rysweet
Copy link
Copy Markdown
Owner Author

rysweet commented Apr 17, 2026

Closing: stale PR superseded by subsequent work.

@rysweet rysweet closed this Apr 17, 2026
@rysweet rysweet deleted the feat/orch-5-timeout-stack-recovery-cluster branch April 17, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant