Skip to content

Latest commit

 

History

History
515 lines (504 loc) · 38.9 KB

File metadata and controls

515 lines (504 loc) · 38.9 KB

Progress Log

2026-04-01 Local Baseline + Smoke Readiness

  • Cloned a fresh local main worktree into D:\CodexWorkspace\game-localization-mvr instead of continuing the historical codex/phase6-operator-workspace-dashboard branch.
  • Re-read the local continuity anchors and current handoff trail:
    • docs/HANDOFF_MAINLINE_GUARDRAILS.md
    • task_plan.md
    • docs/project_lifecycle/roadmap_index.md
    • README.md
    • handoff/m4_session_transfer/*
  • Confirmed the machine did not have an immediately usable Python 3.11 baseline:
    • system Python was 3.14.2
    • py -0p exposed 3.14 and 3.12, but not 3.11
  • Installed managed tooling for project-local version control:
    • winget install astral-sh.uv
    • uv python install 3.11 -> Python 3.11.15
  • Added .python-version with 3.11 and expanded .gitignore to ignore .venv/.
  • Wrote a new PLC continuity record at docs/project_lifecycle/run_records/2026-04/2026-04-01/session_start_202604010215.md.
  • Confirmed current hard blocker before any live smoke:
    • LLM_BASE_URL, LLM_API_KEY, LLM_MODEL, and LLM_TRACE_PATH are not set in this session
    • no .llm_credentials file is present in the repo root
  • Next runtime step is now environment materialization:
    • run the offline validation floor
    • record exact live-smoke follow-up commands for the credential handoff
  • Verified the repo-local environment is usable:
    • existing .venv resolves to Python 3.11.9
    • runtime dependencies plus numpy now import successfully from .venv
  • Closed a real Windows local-baseline blocker:
    • python scripts/style_guide_bootstrap.py --dry-run originally failed with UnicodeEncodeError because the script printed emoji to a GBK console
    • patched scripts/style_guide_bootstrap.py to emit ASCII success lines instead
  • Closed a real placeholder-integrity bug in the keep-chain baseline:
    • normalize_guard.py was letting jieba split printf placeholders, turning %d into % d
    • updated the segmentation skip-regex so printf-style placeholders remain intact in placeholder_map
  • Hardened the local regression harness for fresh clones:
    • scripts/test_normalize.py, scripts/test_qa_hard.py, scripts/test_rehydrate.py, and scripts/test_e2e_workflow.py now create parent temp directories and invoke the current interpreter explicitly
    • scripts/test_qa_hard.py now self-generates its QA reports from fixture inputs
    • scripts/test_rehydrate.py now self-generates valid/invalid placeholder maps instead of relying on stale fixture assumptions
  • Offline validation floor is green:
    • python scripts/test_normalize.py -> pass
    • python scripts/test_qa_hard.py -> pass
    • python scripts/test_rehydrate.py -> pass
    • python scripts/test_e2e_workflow.py -> pass
  • Live llm_ping and smoke remain intentionally pending until credentials are provided.

2026-04-01 Live Smoke Execution

  • Loaded process-scoped live credentials for https://api.apiyi.com/v1 and re-verified connectivity with .\\.venv\\Scripts\\python.exe scripts\\llm_ping.py.
  • Confirmed the router still selects gpt-4.1-nano for llm_ping; the smoke translation override stayed on gpt-4.1-mini to control cost.
  • First preflight attempt failed fast at style_governance_gate:
    • run dir: data/smoke_run_20260331_184226
    • issue: STYLE_GOVERNANCE_GATE_FAIL
    • root cause: workflow/style_profile.generated.yaml was missing from workflow/lifecycle_registry.yaml
  • Patched workflow/lifecycle_registry.yaml to register workflow/style_profile.generated.yaml as an approved runtime-gated style_profile.
  • Re-ran preflight successfully:
    • run dir: data/smoke_run_20260331_184401
    • verify artifact: smoke_verify_smoke_run_20260331_184401.json
    • final authority: overall=PASS
  • Re-ran full successfully:
    • run dir: data/smoke_run_20260331_184605
    • verify artifact: smoke_verify_smoke_run_20260331_184605.json
    • final authority: overall=PASS
  • Live smoke behavior on the 10-row baseline is now characterized:
    • target_lang_effective stayed en-US
    • no RU fallback was triggered
    • row counts stayed aligned at 10 input / 10 translated / 10 final
    • QA Hard produced 5 initial errors, all cleared by Repair Hard
    • Soft QA produced 8 repair tasks; Repair Soft repaired 7 and escalated 1
  • The pipeline is therefore runnable end-to-end on fresh main, but not perfectly clean:
    • manifest gate_summary.status is passed
    • manifest overall_status is warn
    • one review handoff remains queued for string_id=10007436
  • Closed the live-smoke execution slice by retiring the exploration subagent after integrating its artifact/PASS-authority findings.

2026-03-28 Human UI Acceptance Prep

  • Started a dedicated human UAT prep scope for the current merged Phase 5 + 6 UI surface.
  • Confirmed phase6_dashboard_worktree is the correct target for human acceptance; local main is behind remote and not suitable for this pass.
  • Added scripts/seed_phase6_manual_uat.py to create deterministic workspace/runtime data for manual browser validation.
  • Added tests/test_seed_phase6_manual_uat.py to lock the seed utility shape.
  • Added a dated human UAT checklist and PLC prep records under docs/project_lifecycle/run_records/2026-03/2026-03-28/.
  • Re-ran the targeted human-UAT prep checks successfully:
    • python -m pytest tests/test_phase5_acceptance_gate.py -q -> 1 passed
    • python -m pytest tests/test_phase6_acceptance_gate.py -q -> 1 passed
    • python -m pytest tests/test_seed_phase6_manual_uat.py -q -> 1 passed
  • Seeded deterministic manual-UAT data successfully:
    • phase6_manual_uat_derived
    • phase6_manual_uat_persisted
  • Verified live-launch env readiness:
    • python scripts/llm_ping.py -> SUCCESS / PONG
  • Started the local UI server successfully on http://127.0.0.1:8765/.
  • Verified the running server sees both seeded runs through /api/workspace/overview.
  • Remaining work in this scope is now only the human browser walkthrough and evidence capture.

2026-03-28

  • Started the bounded Phase 6 acceptance pass instead of extending implementation scope.
  • Added tests/test_phase6_acceptance_gate.py to exercise the documented python scripts/operator_ui_server.py entrypoint with live HTTP rather than import-only checks.
  • The new acceptance gate verifies:
    • / renders both Runtime Shell and Operator Workspace
    • /api/workspace/overview, /api/workspace/cards, and /api/workspace/runs/{run_id} work against fixture-backed runs
    • derived workspace reads stay side-effect free and do not create persisted operator artifacts
    • persisted operator_cards/operator_summary are still honored when present
    • runtime drilldown and manifest-scoped artifact preview still work through /api/runs*
  • Re-ran the retained Phase 4/5/6 + governance regression floor successfully:
    • python -m pytest tests/test_phase4_operator_control_plane.py tests/test_operator_ui_models.py tests/test_operator_ui_workspace_models.py tests/test_operator_ui_launcher.py tests/test_operator_ui_server.py tests/test_operator_ui_workspace_server.py tests/test_phase5_frontend_runtime_shell.py tests/test_phase5_acceptance_gate.py tests/test_phase6_operator_workspace_dashboard.py tests/test_phase6_acceptance_gate.py tests/test_smoke_verify.py tests/test_runtime_adapter_contract.py tests/test_batch6_repair_metrics_contract.py tests/test_validation_contract.py tests/test_qa_hard.py tests/test_script_authority.py tests/test_batch3_batch4_governance.py tests/test_plc_docs_contract.py -q -> 79 passed
  • Re-ran PLC governance validation successfully:
    • python scripts/plc_validate_records.py --preset representative --preset templates -> Validated 11 PLC governance artifact(s).
  • Shifted .triadev/state.json and .triadev/workflow.json from validation_pending to accepted_pending_pr_closeout.
  • Current roadmap distance is now one closeout step:
    • product acceptance is complete
    • remaining work is GitHub PR review absorption and merge for PR #20

2026-03-21

  • Started PLC + TriadDev integration-priority pass before milestone E.
  • Confirmed the real GitHub integration branch is codex/plc-c-verify in D:\Dev_Env\GPT_Codex_Workspace, while game-localization-mvr/main_worktree is a nested reference worktree with half-applied fixes.
  • Confirmed PLC governance state already marks milestones C and D as done with evidence_ready=true, but GitHub main has not yet absorbed that state.
  • Audited open PRs and selected PR #9 as the only viable mainline integration branch; PR #7 and PR #8 are superseded in scope.
  • Extracted the current blocking review set:
    • soft_qa_llm.py severity loss in merge_tasks()
    • missing prohibited_aliases / banned_terms propagation in translation and soft-QA contracts
    • PLC ledger/schema inconsistencies in milestone-B evidence
  • Chose a minimum validation plan: targeted soft_qa contract tests, a small translation style-contract test, and file-level PLC contract checks.
  • Updated the execution ledger with an explicit PLC + TriadDev Integration Priority section so this pass stays bounded to integration hardening rather than milestone E feature work.
  • Fixed PR #9 code-review gaps in the outer integration repo:
    • soft_qa_llm now prioritizes higher-severity placeholder findings and surfaces prohibited_aliases / banned_terms
    • translate_llm now serializes prohibited_aliases / banned_terms in the style contract output
  • Fixed PLC governance gaps in the outer integration repo:
    • milestone B run manifest now uses schema-valid status pass
    • referenced ADR files now exist under docs/decisions/
    • added a PLC docs contract test to keep run-manifest schema and ADR references honest
  • Ran the targeted regression suite successfully:
    • tests/test_soft_qa_contract.py: 7 passed
    • tests/test_translate_style_contract.py: 1 passed
    • tests/test_plc_docs_contract.py: 2 passed
  • Merged PR #9 into main as fdc253f.
  • Closed PR #7 and PR #8 as superseded by PR #9.
  • Current state: mainline integration phase is complete; milestone E can now start from clean main.
  • Opened clean worktree D:\Dev_Env\GPT_Codex_Workspace_milestone_e on branch codex/milestone-e-prepare.
  • Fast-forwarded the E worktree to include the post-merge PLC handoff commit.
  • Shifted the active planning scope to milestone_E_prepare; next step is E planning/delta/tasks preparation rather than more PR cleanup.

2026-03-24

  • Started milestone E implementation from codex/milestone-e-prepare using the package order E-contract -> E-repro + E-delta-engine -> E-task-executor.
  • Confirmed the E worktree is clean, but the control plane is stale:
    • .triadev/state.json already says milestone_e_prepare
    • .triadev/workflow.json still points at the old Batch 10 closeout change
  • Confirmed the current clean worktree does not contain data/style_profile.yaml or data/glossary.yaml; this is now a first-class E-repro blocker rather than an implicit local-state assumption.
  • Confirmed scripts/glossary_delta.py and scripts/translate_refresh.py are present but still implement a narrow glossary-only refresh path that does not satisfy milestone E.
  • Confirmed current regression status before E implementation:
    • tests/test_translate_style_contract.py: pass
    • tests/test_plc_docs_contract.py: pass
    • tests/test_soft_qa_contract.py: 1 failing test due to style-profile drift semantics
  • Locked the E gate artifact in workflow/milestone_e_contract.yaml and moved the active ledger from planning-only E to implementation-gated E.
  • Completed the first parallel implementation wave after the gate:
    • E-repro now resolves glossary/style authority explicitly, supports clean-worktree bootstrap, and aligns README/workflow examples with live CLI flags.
    • E-delta-engine now emits locale-generic typed delta artifacts and operator-facing aggregate reports instead of a glossary-only impact set.
  • Moved the active package to E-task-executor; the remaining work is to generate incremental tasks from delta_rows.jsonl, split execution from planning, and enforce post-run qa_hard gates.
  • Closed the reviewer blocker pass before phase-2 closeout:
    • executor now stages candidate output before gates and writes an explicit failure-breakdown artifact
    • executor now groups refresh/retranslate work by target_locale, so mixed-market rows update the correct locale columns
    • glossary/style loaders now fail closed for locale mismatches instead of silently borrowing another market's term
  • Updated the E contract to match the implemented surface:
    • removed the unimplemented soft_qa task type from the E task enum
    • pinned the executor failure artifact as incremental_failure_breakdown.json
  • Milestone E focused regression is green again:
    • 27 passed across refresh/executor, repro, typed delta, soft-QA compatibility, translate style contract, and PLC docs contract tests
  • Added a post-E roadmap modify proposal to PLC/TriadDev docs:
    • F → S now has an explicit four-phase interpretation while preserving the original milestone letters
    • recommended next main scope is milestone_F_execute
    • recommended governance sidecar is milestone_M_prepare
  • Switched to stacked branch codex/phase1-quality-closure to start the first Phase 1 implementation slice.
  • Locked the Phase 1 slice to translate_refresh unified execution-status contracts only:
    • no run_smoke_pipeline orchestration changes in this round
    • no soft_qa / repair_loop runtime wiring in this round
  • Current validation plan is focused tests, not smoke:
    • rationale: this slice changes executor artifact semantics but intentionally leaves the smoke entrypoint untouched
  • Implemented the Phase 1 status-contract slice in translate_refresh:
    • task artifacts now persist execution_status, final_status, and status_reason
    • manifest artifacts now persist overall_status, task_outcomes, and gate_summary
    • review queue rows now persist review_source
  • Closed the last Phase 1 contract gap in the main thread:
    • execution failures now keep the staged candidate artifact and return a non-zero exit code instead of silently promoting final output
  • Phase 1 focused acceptance is green:
    • python -m pytest tests/test_translate_refresh_contract.py tests/test_milestone_e_e2e.py tests/test_plc_docs_contract.py -q
    • result: 10 passed
  • Smoke remains intentionally skipped for this slice because scripts/run_smoke_pipeline.py is unchanged and orchestration behavior is out of scope for this PR.

2026-03-25

  • Confirmed PR #10 and PR #11 are merged and origin/main now contains both the milestone E baseline and Phase 1 quality-closure follow-up.
  • Shifted the active roadmap scope from the merged milestone_F_execute slice to milestone_M_prepare on branch codex/phase2-governance-substrate.
  • Chose a bounded Phase 2 first package instead of trying to execute all of M/N/O/P at once:
    • freeze a machine-checkable governance contract for run_manifest, session_start, session_end, and milestone_state
    • add a validator utility for those artifacts
    • extend PLC docs regression to lock representative records and templates to the same contract
  • Validation plan for this slice is focused governance tests only:
    • rationale: Phase 2 first package changes documentation contracts and validator code, not runtime translation orchestration
  • Completed the first Phase 2 governance substrate package:
    • added workflow/plc_governance_contract.yaml as the machine-checkable contract source
    • added scripts/plc_validate_records.py as the repo-local validator
    • expanded tests/test_plc_docs_contract.py to validate templates, representative records, and preset-based validator runs
  • Synced the human-facing governance docs to the same contract language:
    • field_schema.md
    • session_start_template.md
    • session_end_template.md
    • milestone_state_template.md
    • continuity_protocol.md
  • Focused Phase 2 acceptance is green:
    • python -m pytest tests/test_plc_docs_contract.py -q -> 7 passed
    • python scripts/plc_validate_records.py --preset representative --preset templates -> Validated 7 PLC governance artifact(s).
  • Smoke remains intentionally skipped for this slice because the runtime pipeline and orchestrator are untouched.
  • Started the Phase 2 closeout package on codex/phase2-governance-closeout to finish the remaining O + P substrate work.
  • Expanded the governance target from “first bounded package” to “phase-complete closeout”:
    • machine-checkable three-point validation for changed_files, evidence_refs, and adr_refs
    • closeout-grade representative records for session, run manifest, and milestone state
    • Phase 3 stays planning-ready only until a later implementation-start decision
  • Completed the Phase 2 closeout package:
    • aligned workflow/plc_governance_contract.yaml, field_schema.md, continuity_protocol.md, and the PLC templates to the same three-point governance semantics
    • upgraded representative PLC records so run/session/milestone artifacts all carry changed_files, evidence_refs, and adr_refs
    • closed milestone_state_M.md with status=done and evidence_ready=true
  • Focused closeout acceptance is green:
    • python -m pytest tests/test_plc_docs_contract.py -q -> 9 passed
    • python scripts/plc_validate_records.py --preset representative --preset templates -> Validated 7 PLC governance artifact(s).
  • TriadDev control plane is now aligned to phase3_planning_ready; Phase 3 remains planning-only, not implementation-started.
  • Confirmed PR #13 is merged and moved the active branch to codex/milestone-i-prepare from clean main.
  • Started milestone_I_prepare as a planning-only Phase 3 slice:
    • active target is a bounded style-governance contract package
    • runtime implementation remains gated until H completes
  • Recorded a fresh set of Phase 3 PLC artifacts:
    • phase3_milestone_i_prepare_note.md
    • run_manifest_phase3_milestone_i_prepare.json
    • session_start_20260325_phase3_milestone_i_prepare.md
    • session_end_20260325_phase3_milestone_i_prepare.md
    • milestone_state_I.md
  • Focused Phase 3 planning acceptance is green:
    • python -m pytest tests/test_plc_docs_contract.py -q
    • record-level validation of the new run manifest, session start, session end, and milestone state under scripts/plc_validate_records.py
  • Merged PR #14 into main and reopened Phase 3 from clean trunk on codex/milestone-i-contract-package.
  • Completed the first milestone-I implementation package:
    • added workflow/style_governance_contract.yaml
    • added style-governance metadata and lineage to data/style_profile.yaml
    • updated scripts/style_guide_bootstrap.py and scripts/style_sync_check.py to emit and validate the governance header
    • synced the version/governance header into workflow/style_guide.generated.md, workflow/style_guide.md, and .agent/workflows/style-guide.md
    • added tests/test_style_governance_contract.py
  • Focused milestone-I contract acceptance is green:
    • python -m pytest tests/test_style_governance_contract.py tests/test_translate_style_contract.py tests/test_soft_qa_contract.py -q -> 12 passed
    • python scripts/style_sync_check.py -> pass
  • Treated merged PR #15 as a bridge foundation only and returned the active execution lane to Phase 1 on fresh main.
  • Opened codex/phase1-quality-runtime-closeout as the single active implementation branch under the phase-sized merge-window policy.
  • Completed the remaining Phase 1 runtime closure in scripts/run_smoke_pipeline.py:
    • hard QA now routes through repair_loop with explicit recheck and blocked-state handling
    • soft QA now routes through bounded repair, fail-closed hard-gate review handoff, and rollback-safe promotion
    • smoke manifests now persist repair_cycles, review_handoff, gate_summary, and delivery_decision
  • Added focused Phase 1 runtime contract coverage:
    • tests/test_phase1_quality_runtime_contract.py now locks hard-repair completion, soft rollback, and soft hard-gate-without-tasks handoff
    • tests/test_batch6_repair_metrics_contract.py now carries explicit style-profile and soft-QA rubric inputs for smoke orchestration tests
    • tests/test_repair_loop_contract.py keeps CLI doc authority focused on the repair workflow itself
  • Phase 1 runtime acceptance is green again:
    • python -m py_compile scripts/run_smoke_pipeline.py
    • python -m pytest tests/test_batch6_repair_metrics_contract.py tests/test_phase1_quality_runtime_contract.py tests/test_repair_loop_contract.py tests/test_soft_qa_contract.py tests/test_smoke_verify.py -q -> 29 passed
    • python -m pytest tests/test_translate_refresh_contract.py tests/test_milestone_e_e2e.py -q -> 10 passed
  • PLC/TriadDev phase-boundary records now validate for the new Phase 1 run/session/milestone artifacts.
  • Merged PR #16 into main as 3a84f55, closing the full Phase 1 large-batch runtime scope.
  • Phase 1 review feedback is fully absorbed:
    • early-fail smoke manifests now report failed correctly
    • non-ru-RU review handoff rows keep current translated text
    • representative PLC milestone records now point to milestone_state_H.md
  • Re-ran post-review acceptance successfully:
    • focused runtime: 31 passed
    • focused executor + PLC docs: 21 passed
    • PLC validator presets: Validated 11 PLC governance artifact(s).
  • Current roadmap decision is now Phase 3, not Phase 4:
    • Phase 2 is already complete
    • H is merged, which removes the documented gate for broader I/J/K/L
    • the milestone-I bridge package is already on main as foundation
  • Opened a new value-first gate for the next batch and scored the full Phase 3 batch GO (25/30, High confidence).
  • Opened codex/phase3-language-governance-batch from clean main and moved Phase 3 from planning into implementation.
  • Frozen Phase 3 shared contracts/helpers before downstream wiring:
    • review ticket / feedback log / lifecycle / KPI contracts
    • scripts/style_governance_runtime.py
    • scripts/review_governance.py
    • scripts/review_feedback_ingest.py
  • Active implementation split is now:
    • runtime style-governance enforcement in translate + soft QA
    • review ticket / feedback / lifecycle / KPI wiring in refresh + smoke pipeline
  • Completed the shared Phase 3 governance helper layer:
    • scripts/style_governance_runtime.py
    • scripts/review_governance.py
    • scripts/review_feedback_ingest.py
    • scripts/language_governance.py as a thin compatibility wrapper over the new helper/contract surfaces
  • Completed runtime consumer integration for the Phase 3 batch:
    • translate_llm.py and soft_qa_llm.py now fail closed on governed style-profile violations
    • translate_refresh.py now emits review tickets, feedback-log placeholders, lifecycle-aware KPI artifacts, and governed review handoff
    • run_smoke_pipeline.py now emits the same Phase 3 review / KPI artifacts without breaking the Phase 1 orchestration contract
  • Phase 3 focused acceptance is green:
    • python -m pytest tests/test_phase3_governance_helpers.py tests/test_phase3_runtime_governance.py tests/test_phase3_language_governance_contract.py tests/test_translate_refresh_contract.py tests/test_phase1_quality_runtime_contract.py tests/test_translate_style_contract.py tests/test_soft_qa_contract.py tests/test_plc_docs_contract.py -q -> 44 passed
    • python scripts/style_sync_check.py -> pass
    • python scripts/plc_validate_records.py --preset representative --preset templates -> Validated 11 PLC governance artifact(s).
  • Live smoke feasibility was checked and blocked by environment only:
    • python scripts/llm_ping.py failed because LLM_BASE_URL / LLM_API_KEY are missing in the current shell
    • merge acceptance therefore uses the required representative smoke gate via deterministic orchestration coverage in tests/test_phase1_quality_runtime_contract.py
  • Current branch status:
    • codex/phase3-language-governance-batch is implementation-complete
    • PR #17 is open: feat(phase3): land language governance batch
    • the next step is review absorption and merge, not more Phase 3 implementation

2026-03-26

  • Confirmed PR #17 is merged into main as 88e9dba; Phase 3 is now closed history, not the active execution lane.
  • Opened codex/phase4-operator-control-plane-batch from clean main.
  • Started Phase 4 as one phase-sized batch with bridge hardening included:
    • repair_loop target-column detection now excludes locale/language metadata columns
    • language_governance no longer falls back to the default lifecycle registry when an explicit caller registry is incomplete
  • Added the Phase 4 operator control plane surface:
    • workflow/operator_card_contract.yaml
    • scripts/operator_control_plane.py
    • tests/test_phase4_operator_control_plane.py
  • Accepted the operating-model ADR:
    • docs/decisions/ADR-0003-operator-control-plane-operating-model.md
  • Focused bridge/operator acceptance is green:
    • python -m pytest tests/test_repair_loop_contract.py tests/test_phase3_language_governance_contract.py tests/test_phase4_operator_control_plane.py -q -> 22 passed
    • python -m py_compile scripts/operator_control_plane.py scripts/repair_loop.py scripts/language_governance.py
  • Remaining work in this phase is to:
    • sync PLC/TriadDev control-plane state to Phase 4
    • materialize the representative operator cards/report walkthrough from an existing run
    • run full focused acceptance and open one Phase 4 PR

2026-03-18

  • Started M4 execution task for the 1000-row layered smoke input.
  • Created task_plan.md, findings.md, and progress.md.
  • Ran direct llm_ping successfully with the provided LLM credentials.
  • Ran exact-input preflight and full pipeline attempts; both short-circuited at connectivity inside run_smoke_pipeline.py.
  • Older full run on a related 1000-row artifact reached translation and QA Hard, then failed with 85 QA errors and Translated 1002 / 1003 rows.

2026-03-19

  • Created and pushed checkpoint branch codex/checkpoint-mainline-20260319.
  • Switched to codex/deep-cleanup-r3 for deep-cleanup Batch 1.
  • Materialized TriadDev brownfield control files and value gate artifacts.
  • Added script authority manifest/report tooling for main_worktree/scripts vs src/scripts.
  • Expanded Batch 1 runtime adapter coverage and added unit tests for the authority checker.
  • Ran Batch 1 regression suite successfully: 29 passed.
  • Re-ran m4_3_collect_coverage.py and m4_4_decision.py; the decision summary remains KEEP=6.
  • Current authority report is WARN because runtime_adapter.py is still alert-only drift, while required mirrors remain aligned.
  • Started Batch 2 under the first-principles rule: preserve the smallest system needed for continued development, do not delete uncertain code.
  • Added Batch 2 contract tests for runtime_adapter, normalize_*, and soft_qa_llm.
  • Fixed explicit-router injection in runtime_adapter.LLMClient.
  • Moved import-time standard-stream rewiring out of normalize_tagger.py, normalize_tag_llm.py, translate_llm.py, and soft_qa_llm.py into CLI-time configuration.
  • Fixed soft_qa_llm.py --dry-run to use batch_utils.SplitBatchConfig and split_into_batches.
  • Batch 2 focused test surface is green: 14 passed.
  • Started roadmap Phase 1 Batch 3/4 to convert near-core status decisions and frozen-zone boundaries into explicit governance artifacts.
  • Collected branch-topology evidence for the later GitHub cleanup phase: several remote branches are fully contained in origin/main, while reorg/v1.3.0-structure remains the only audit-first diverged branch.
  • Added workflow/batch3_surface_inventory.json and workflow/batch4_frozen_zone_inventory.json plus reports/github_branch_audit_20260319.md to make Phase 1 and Phase 2 decisions auditable.
  • Added tests/test_batch3_batch4_governance.py to lock wrapper forwarding, CLI compatibility, and governance status expectations.
  • Refined surface statuses: normalize_ingest.py is now compat-keep documented ingest, normalize_tag_llm.py is now stress-only compat entrypoint.
  • Fixed scripts/stress_test_3k_run.sh so soft QA writes --out_report and --out_tasks, and the soft repair loop consumes the emitted tasks JSONL instead of the report JSON.
  • Phase 1 regression plus evidence gate is green again: 50 passed, authority remains WARN on runtime_adapter.py alert-only drift, and M4 remains at KEEP=6.
  • Started Batch 5 on branch codex/deep-cleanup-batch5 after local main_worktree was re-aligned with origin/main and GitHub governance was fully closed out.
  • Added tests/test_batch5_archive_candidates.py to characterize the archived CLI shape of repair_loop_v2.py and the hard-coded recovery behavior of repair_checkpoint_gaps.py.
  • Independent subagent review found hidden dependency blockers before archive could be finalized: active rules/root inventory still mention repair_loop_v2.py, and repair_checkpoint_gaps.py still participates in the documented translate-checkpoint recovery contract.
  • Rolled the archive action back immediately, restored both files to scripts/, and converted Batch 5 into an audit-and-fallback step instead of a physical cleanup step.
  • Updated the cleanup roadmap to treat Batch 5 as the point where these two repair-side utilities move from archive-candidate to blocked until their surrounding contracts are formally retired.
  • Batch 5 regression and evidence gate are green: 56 passed, authority is back to WARN on runtime_adapter.py only after required compat mirrors were resynced, and M4_4_decision.jsonl remains KEEP=6.
  • Started Batch 6 on branch codex/deep-cleanup-batch6 to retire repair-side governance contracts before any future archive attempt and to restore smoke metrics as optional observability.
  • Rewrote the active rules, root inventory, and translate workflow so repair_loop_v2.py and repair_checkpoint_gaps.py are no longer presented as current tools.
  • Downgraded both repair-side targets from blocked back to archive-candidate in workflow/batch4_frozen_zone_inventory.json; Batch 6 still does not physically archive them.
  • Reconnected scripts/metrics_aggregator.py inside scripts/run_smoke_pipeline.py as a non-blocking Metrics stage that writes manifest-visible report artifacts before verify.
  • Extended scripts/metrics_aggregator.py with usage fallback based on trace token fields and char-count estimation so sparse traces still produce stable totals and cost estimates.
  • Added tests/test_batch6_repair_metrics_contract.py and turned the initial RED surface green: 7 passed.
  • Ran the full Batch 6 regression suite plus evidence gate successfully: 63 passed, scripts/check_script_authority.py returned WARN on runtime_adapter.py only, and scripts/m4_4_decision.py still reports KEEP=6.
  • Re-synced src/scripts/run_smoke_pipeline.py from the authority copy after the new Metrics stage introduced required-mirror drift; authority returned to the expected non-blocking state immediately afterward.
  • Started Batch 7 on branch codex/deep-cleanup-batch7 with the new top-level priority: restore sustainable production development rather than continue chasing script deletion.
  • Added and expanded deterministic validation coverage in tests/test_validation_contract.py for explicit CLI paths, scoring, parse fallback, and metadata/report schema.
  • Added and expanded deterministic repair coverage in tests/test_repair_loop_contract.py for hard-report JSON input, soft JSONL input, passthrough copy behavior, routing metadata, and runbook alignment.
  • Updated docs/repro_baseline.md so validation commands now prefer explicit --input, --output-dir, --report-dir, and --api-key-path flags, and switched the retained credential example away from the drifted config/api_key.txt path.
  • Updated docs/WORKSPACE_RULES.md so repair metadata steps now match the runtime truth (repair_hard / repair_soft_major) and checkpoint behavior is documented as snapshot-only rather than true resume support.
  • Promoted scripts/run_validation.py and scripts/build_validation_set.py to must-keep in workflow/batch4_frozen_zone_inventory.json; scripts/repair_loop.py is being promoted in the same inventory as the retained repair authority.
  • Ran focused Batch 7 contract tests successfully: tests/test_validation_contract.py, tests/test_repair_loop_contract.py, and tests/test_batch3_batch4_governance.py are green.
  • Ran the full Batch 7 regression suite successfully: 77 passed.
  • Re-ran the evidence gate successfully: scripts/check_script_authority.py remains WARN on runtime_adapter.py only, scripts/m4_3_collect_coverage.py reports 0 issue hotspots, and scripts/m4_4_decision.py still reports KEEP=6.
  • Committed Batch 7 as ddd14e2 (cleanup(batch7): recover production dev baselines) and pushed branch origin/codex/deep-cleanup-batch7 for PR review.
  • Started Batch 8 on branch codex/deep-cleanup-batch8.
  • Confirmed the retained mainline paths remain scripts/repair_loop.py and scripts/rebuild_checkpoint.py, while repair_loop_v2.py and repair_checkpoint_gaps.py are only historical archive targets now.
  • Added Batch 8 characterization coverage for physical archive closeout and relaxed Batch 5/6 tests so they continue to validate historical evidence after the move.
  • Physically moved repair_loop_v2.py and repair_checkpoint_gaps.py into _obsolete/repair_archive/ and added an audit README plus Batch 8 closeout report.
  • Ran focused Batch 8 archive-closeout coverage successfully: 17 passed.
  • Ran the full Batch 8 regression suite successfully: 81 passed.
  • Re-ran the evidence gate successfully: scripts/check_script_authority.py remains WARN on runtime_adapter.py only, scripts/m4_3_collect_coverage.py reports 0 issue hotspots, and scripts/m4_4_decision.py still reports KEEP=6.
  • Committed Batch 8 as 3ae4fac (cleanup(batch8): close out repair archive migration), pushed branch origin/codex/deep-cleanup-batch8, and opened PR #3.
  • Started Batch 9 on branch codex/deep-cleanup-batch9.
  • Added tests/test_batch9_stress_surface_governance.py to pin one retained stress shell path, explicit helper statuses, and the removal of the generic blocked stress bucket.
  • Reclassified the stress surface in workflow/batch4_frozen_zone_inventory.json from one blocked umbrella to explicit shell/helper statuses.
  • Updated workflow/batch3_surface_inventory.json so retained near-core references now point to scripts/stress_test_3k_run.sh rather than historical 5k acceptance helpers.
  • Fixed the retained stress shell export invocation in scripts/stress_test_3k_run.sh so it now matches the current positional rehydrate_export.py contract.
  • Started Batch 10 on branch codex/deep-cleanup-batch10 as a stacked closeout branch on top of Batch 9 while PR #4 remains open.
  • Reframed src/scripts from vague long-tail compat noise into an explicit separate-exit-program compatibility liability in the authority manifest and frozen-zone inventory.
  • Removed the non-real gate/** placeholder from the frozen-zone inventory so closure is judged only against real surfaces.
  • Added Batch 10 governance coverage for closeout decision semantics, root inventory wording, and the requirement that no real blocked surface remains in the cleanup inventory.
  • Ran focused Batch 10 governance tests successfully: 15 passed.
  • Ran the full retained regression suite successfully with Batch 10 included: 92 passed.
  • Re-ran the evidence gate successfully: scripts/check_script_authority.py remains WARN on runtime_adapter.py only, scripts/m4_3_collect_coverage.py reports 0 issue hotspots, and scripts/m4_4_decision.py still reports KEEP=6.
  • Batch 10 therefore closes the current cleanup roadmap: src/scripts remains operationally present, but any future mirror retirement now moves into a separate migration program instead of staying on the main cleanup path.
  • Ran focused Batch 9 governance regression successfully: 42 passed.
  • Re-ran the evidence gate successfully: scripts/check_script_authority.py remains WARN on runtime_adapter.py only, scripts/m4_3_collect_coverage.py reports 0 issue hotspots, and scripts/m4_4_decision.py still reports KEEP=6.
  • Re-ran the full explicit test-file suite successfully with -s to avoid the existing pytest capture issue in this workspace: 98 passed, 8 skipped.
  • Batch 9 now marks the roadmap as closeout-ready: Batch 10 should focus only on the long-term decision for src/scripts compat mirror rather than opening new cleanup surfaces.

2026-03-27

  • Re-opened Phase 5 on codex/phase5-frontend-runtime-shell to finish acceptance and PR closeout rather than starting Phase 6.
  • Verified the provided LLM credentials with python scripts/llm_ping.py; connectivity passed and returned PONG.
  • Confirmed PR #19 remains open and currently merge-blocked by one branch conflict plus four unresolved review threads.
  • Fixed the operator UI frontend contract so detail rendering now consumes run.stages and run.verify.
  • Fixed operator UI launcher run-id generation so same-second launches produce unique run IDs and run directories.
  • Hardened the documented python scripts/operator_ui_server.py entrypoint with repo-root import bootstrapping and frontend asset fallback.
  • Added/extended closeout coverage: tests/test_operator_ui_launcher.py, tests/test_operator_ui_server.py, tests/test_phase5_frontend_runtime_shell.py, and tests/test_phase5_acceptance_gate.py.
  • Re-ran the full retained Phase 5 regression floor successfully: 39 passed plus 14 passed.
  • Ran a real representative online preflight launch through the local UI server using D:\Dev_Env\loc-mvr 测试文档\test_input_200-row.csv; the launched run was ui_run_20260327_052512_477292_22c1, run_manifest.json was produced, the UI/API exposed 7 stages, verify returned PASS, and smoke_verify_log preview was available through the artifact endpoint.
  • Started Phase 6 implementation on fresh main branch codex/phase6-operator-workspace-dashboard.
  • Split scripts/operator_control_plane.py into a pure derivation path and retained write-on-demand summarize path so workspace GET requests stay side-effect free.
  • Extended scripts/operator_ui_models.py with workspace overview, card-list, and run-detail read models that prefer persisted operator artifacts and fall back to derived payloads.
  • Extended scripts/operator_ui_server.py with: GET /api/workspace/overview, GET /api/workspace/cards, and GET /api/workspace/runs/{run_id}.
  • Reworked operator_ui/index.html, operator_ui/styles.css, and operator_ui/app.js so the local shell now supports Runtime Shell and Operator Workspace modes in one page.
  • Added Phase 6 RED/acceptance coverage in: tests/test_operator_ui_workspace_models.py, tests/test_operator_ui_workspace_server.py, and tests/test_phase6_operator_workspace_dashboard.py.
  • Ran the focused Phase 6 + retained regression floor successfully: 78 passed.
  • Re-ran PLC governance validation successfully: python scripts/plc_validate_records.py --preset representative --preset templates -> Validated 11 PLC governance artifact(s).