hyperifyio
diff --git a/‎FEATURE_CHECKLIST.md‎
Lines changed: 16 additions & 1 deletion b/‎FEATURE_CHECKLIST.md‎
Lines changed: 16 additions & 1 deletion
@@ -248,7 +248,7 @@
 * [x] Create ADR-0003 “Toolchain & Lint Policy (Go + golangci-lint)” documenting that CI must use the Go version declared by `go.mod` and that `golangci-lint` is pinned to a known-good version for that Go line; include upgrade policy (bump both together via PR), risks, and rollback. Smallest change: add `docs/adr/0003-toolchain-and-lint-policy.md` with context, options, decision, consequences, and a link to the canonical issue URL (created in this PR). DoD: ADR rendered on GitHub, linked from `docs/README.md` and `README.md` (Tooling section); all gates green; one peer review completed.
   - [x] [S01:l248-lint-green] Install ripgrep (rg) locally so `make check-tools-paths` can run, then rerun `make lint` to satisfy gates; next step: `sudo apt-get update && sudo apt-get install -y ripgrep` (Linux) or `brew install ripgrep` (macOS).
 * [x] Pin CI to module Go version using `actions/setup-go` with `go-version-file: go.mod`. Smallest change: edit `.github/workflows/ci.yml` to configure `actions/setup-go@v5` with `go-version-file: go.mod` in every job (linux/macos/windows), print `go version` for traceability, and keep the existing matrix. DoD: a fresh CI run shows the same Go major.minor on all OSes (visible in logs), `make tidy lint test build build-tools` pass, gates green, peer review completed, rollback by reverting the workflow hunk.
-* [ ] Change the **default sampling temperature to 1.0** by updating the agent CLI’s flag default and the underlying request-option defaults, ensure the value propagates into the outbound payload when supported, add a unit test that asserts the resolved default is 1.0 when no overrides are provided, and update README’s “Common flags” so `-temp` shows “default 1.0” instead of 0.2 (also add a short rationale line in docs); DoD: unit tests green, `agentcli -h` displays 1.0, README/Docs updated. ([GitHub][1])
+* [x] Change the **default sampling temperature to 1.0** by updating the agent CLI’s flag default and the underlying request-option defaults, ensure the value propagates into the outbound payload when supported, add a unit test that asserts the resolved default is 1.0 when no overrides are provided, and update README’s “Common flags” so `-temp` shows “default 1.0” instead of 0.2 (also add a short rationale line in docs); DoD: unit tests green, `agentcli -h` displays 1.0, README/Docs updated. ([GitHub][1])
 * [ ] Implement a **model capability map** in the internal request-options layer to determine `SupportsTemperature` per model (e.g., GPT-5 variants → true; any known exceptions → false with inline comment), expose a simple lookup used at call time, and write a table-driven unit test that covers at least three model IDs and both outcomes; DoD: lookup used by payload builder, tests green, brief note added to docs “Model parameter compatibility”.
 * [ ] Update the **payload encoder** so that `temperature` is omitted entirely when `SupportsTemperature == false` and included (1.0 or user override) when true, preserving existing behavior for other params; add golden-file or snapshot tests for both branches; DoD: encoder tests green and logs confirm presence/omission in debug mode.
 * [ ] Add an **ADR (“Default LLM Call Policy”)** under `docs/` (create `docs/adr` if missing) capturing context, options considered, the decision to default temperature to 1.0 with capability-based omission, and the retry/guard policies; include a Mermaid sequence of the tool-call flow; DoD: ADR merged and diagram renders on GitHub.
@@ -258,6 +258,8 @@
 * [ ] Document correct tool-call sequencing with a minimal JSON example in `README.md#tool-calls` (assistant with `tool_calls[]` → tool messages with matching `tool_call_id` → assistant), including note on parallel tool calls requiring one tool message per id. ([Microsoft Learn][8])
 * [ ] Implement a **message-sequence validator** that rejects any `role:"tool"` message unless it responds to a prior assistant message with `tool_calls[]` and a matching `tool_call_id`, and surface a pre-flight error that mirrors the API’s wording; add unit tests for valid/invalid transcripts; DoD: validator on by default, tests green, troubleshooting doc updated. ([OpenAI Platform][2])
 * [ ] Add a **regression test for the exact 400** you hit by crafting a transcript with a stray `role:"tool"` lacking a prior `tool_calls[]`; assert the validator blocks it locally with a helpful error and that the request is never sent; DoD: failing test first, then fix, then green, and an entry added to the Troubleshooting section.
+* [ ] CLI errors: when required flags (e.g., `-prompt`) are missing, print concise error followed by the usage synopsis and exit with code 2 (not 1); DoD: unit test verifies stderr contains error + usage, exit code is 2, no network or tool exec attempted; CI and all gates green; one peer review completed.
+* [ ] CLI version: add `--version`/`-version` to print semver + commit + build date and exit 0 without validating other flags; DoD: unit test asserts format and exit code; README “Usage” updated; CI and all gates green; one peer review completed.
 * [x] Add `make check-go-version` that fails early if the active toolchain doesn’t match `go.mod`. Smallest change: in `Makefile`, add target that extracts `MOD_GO=$$(awk '/^go [0-9]+\\.[0-9]+/ {print $$2; exit}' go.mod)` and `SYS_GO=$$(go version | sed -E 's/.*go([0-9]+\\.[0-9]+).*/\\1/')`; compare and `exit 2` with a clear message if different. Document this target briefly in `README.md` under “Developer workflow”. DoD: running `make check-go-version` passes when versions match and fails with an actionable message when mismatched; CI invokes it (temporarily from a one-off verification commit) and stays green; peer review completed.
   - [x] [S01:l252-wire-lint] Prepend `check-go-version` to `lint` so it runs first; local `make lint` now fails fast on toolchain mismatch as intended.
   - [x] [S01b:l252-local-proof] Verified locally that `make lint` executes `check-go-version` first (observed "check-go-version: OK" before golangci-lint output); CI verification remains pending.
@@ -293,3 +295,16 @@
 * [ ] Fix Makefile lint target to call golangci-lint reliably after on-demand install by referencing the binary at the Go GOPATH bin directory (go env GOPATH)/bin/golangci-lint or by exporting GOBIN=./bin and adding it to PATH within the recipe, because make lint currently prints Installing golangci-lint then fails with command not found; smallest change: edit the Makefile lint recipe only; scope Makefile; low risk; DoD: make lint succeeds on a clean machine and in CI with all gates green and no coverage regression; verify by removing the binary from PATH and running make lint; rollback by reverting the Makefile change.
   - [ ] [S02:l242-verify-green] Verify `make lint` green on a machine with ripgrep installed and in CI; if issues persist, pivot to installing into `./bin` and invoking `$(CURDIR)/bin/golangci-lint$(EXE)`.
   - [ ] [S02b:l242-install-rg] Install ripgrep (rg) locally and in CI where needed, rerun `make lint`, and capture PASS results; do not weaken gates.
+* [ ] Reference docs: add `docs/reference/cli.md` that enumerates every flag with default, env fallback, precedence, and exit codes for `--help`/`--version`/missing-required; link from README; DoD: doc renders on GitHub, content matches current binary (checked by a small test that scans `--help` output for each documented flag), CI and all gates green; one peer review completed.
+* [ ] Update docs `docs/llm-policy.md` to state default `temperature=1.0`, show GPT-5 `verbosity` (`low|medium|high`) and `reasoning_effort` usage, and call out that some reasoning models restrict sampling knobs; validate links in CI. ([OpenAI][6], [OpenAI Cookbook][7], [Microsoft Learn][3])
+* [ ] Implement config precedence for temperature in `cmd/cli/flags.go`: `--temperature` > `LLM_TEMPERATURE` > config file > default 1.0, and if `--top-p` also provided then unset temperature with a CLI warning; add flag parsing tests in `cmd/cli/flags_test.go`. ([Anthropic][5])
+* [ ] Add **parameter-recovery retry**: if the API returns HTTP 400 with a message indicating an invalid/unsupported `temperature`, strip the parameter and retry once before the normal exponential backoff path, with a structured log field describing the recovery; DoD: integration test with a mock server that first 400s on `temperature` and then succeeds without it, logs verified.
+* [ ] Extend **temperature-nudge logic** to no-op when `SupportsTemperature == false` and to clamp within \[0.1, 1.0] otherwise; include unit tests that simulate repetition/format-failure signals to trigger −0.1 adjustments and diversity signals to +0.1 (never exceeding 1.0); DoD: tests green and brief docs note.
+* [ ] Append ADR addendum in `docs/adr/001-default-llm-policy.md` noting the change to temperature=1.0 by default for API parity and GPT-5 compatibility, with links to OpenAI docs; include concise rationale and rollout note. ([OpenAI Platform][1], [OpenAI][6])
+* [ ] Enforce **“change one sampling knob”**: when user passes `--top-p`, ensure the payload does not include `temperature`; when `--top-p` is unset, send `temperature` (default 1.0) and leave `top_p` null; add unit tests for precedence and serialization, and add a one-sentence rule to docs; DoD: tests green, docs updated.
+* [ ] Implement a **prompt profile mapper** (deterministic | general | creative | reasoning) that sets temperature as follows: deterministic→0.1 (only if supported), general→1.0, creative→1.0, reasoning→1.0 unless model forbids it (then omit); add unit tests for profile→option mapping and a doc table with examples; DoD: mapper covered by tests, docs updated.
+* [ ] Add **observability fields** `temperature_effective` (the value actually used after clamps/omissions) and `temperature_in_payload` (bool) to structured logs, and document them in the troubleshooting section; DoD: unit test asserts both fields are emitted, docs updated.
+* [ ] Add **length backoff**: when the API reply indicates truncation (e.g., `finish_reason == "length"` or provider-equivalent), automatically double the completion cap once (bounded by remaining context) and retry; include unit tests that simulate truncation; DoD: tests green and behavior documented.
+* [ ] Keep agent loop safety by verifying **`-max-steps` defaults to 8** (hard wall 15 in code if not already present), and add a unit test that the loop terminates with a clear “needs human review” message when the cap is hit; DoD: tests green and README mentions the guard. ([GitHub][1])
+* [ ] Ensure **HTTP timeouts and retries** align with your README flags (`-http-timeout`, `-http-retries`, `-http-retry-backoff`) by wiring jittered exponential backoff for 429/5xx/timeouts and keeping the global default timeout sane (e.g., minutes, not seconds); add unit tests for retry schedule and a doc snippet clarifying defaults; DoD: tests green and README consistent. ([GitHub][1])
+* [ ] Update **README “Common flags”** so all listed defaults (especially `-temp 1.0` and `-max-steps 8`) match the executable’s behavior, and add a short “Why you usually don’t need to change knobs” section pointing to the policy; DoD: README committed and links from the table of contents work. ([GitHub][1])