- Cloned a fresh local
mainworktree intoD:\CodexWorkspace\game-localization-mvrinstead of continuing the historicalcodex/phase6-operator-workspace-dashboardbranch. - Re-read the local continuity anchors and current handoff trail:
docs/HANDOFF_MAINLINE_GUARDRAILS.mdtask_plan.mddocs/project_lifecycle/roadmap_index.mdREADME.mdhandoff/m4_session_transfer/*
- Confirmed the machine did not have an immediately usable Python 3.11 baseline:
- system Python was
3.14.2 py -0pexposed3.14and3.12, but not3.11
- system Python was
- Installed managed tooling for project-local version control:
winget install astral-sh.uvuv python install 3.11->Python 3.11.15
- Added
.python-versionwith3.11and expanded.gitignoreto ignore.venv/. - Wrote a new PLC continuity record at
docs/project_lifecycle/run_records/2026-04/2026-04-01/session_start_202604010215.md. - Confirmed current hard blocker before any live smoke:
LLM_BASE_URL,LLM_API_KEY,LLM_MODEL, andLLM_TRACE_PATHare not set in this session- no
.llm_credentialsfile is present in the repo root
- Next runtime step is now environment materialization:
- run the offline validation floor
- record exact live-smoke follow-up commands for the credential handoff
- Verified the repo-local environment is usable:
- existing
.venvresolves toPython 3.11.9 - runtime dependencies plus
numpynow import successfully from.venv
- existing
- Closed a real Windows local-baseline blocker:
python scripts/style_guide_bootstrap.py --dry-runoriginally failed withUnicodeEncodeErrorbecause the script printed emoji to a GBK console- patched
scripts/style_guide_bootstrap.pyto emit ASCII success lines instead
- Closed a real placeholder-integrity bug in the keep-chain baseline:
normalize_guard.pywas letting jieba split printf placeholders, turning%dinto% d- updated the segmentation skip-regex so printf-style placeholders remain intact in
placeholder_map
- Hardened the local regression harness for fresh clones:
scripts/test_normalize.py,scripts/test_qa_hard.py,scripts/test_rehydrate.py, andscripts/test_e2e_workflow.pynow create parent temp directories and invoke the current interpreter explicitlyscripts/test_qa_hard.pynow self-generates its QA reports from fixture inputsscripts/test_rehydrate.pynow self-generates valid/invalid placeholder maps instead of relying on stale fixture assumptions
- Offline validation floor is green:
python scripts/test_normalize.py-> passpython scripts/test_qa_hard.py-> passpython scripts/test_rehydrate.py-> passpython scripts/test_e2e_workflow.py-> pass
- Live
llm_pingand smoke remain intentionally pending until credentials are provided.
- Loaded process-scoped live credentials for
https://api.apiyi.com/v1and re-verified connectivity with.\\.venv\\Scripts\\python.exe scripts\\llm_ping.py. - Confirmed the router still selects
gpt-4.1-nanoforllm_ping; the smoke translation override stayed ongpt-4.1-minito control cost. - First
preflightattempt failed fast atstyle_governance_gate:- run dir:
data/smoke_run_20260331_184226 - issue:
STYLE_GOVERNANCE_GATE_FAIL - root cause:
workflow/style_profile.generated.yamlwas missing fromworkflow/lifecycle_registry.yaml
- run dir:
- Patched
workflow/lifecycle_registry.yamlto registerworkflow/style_profile.generated.yamlas an approved runtime-gatedstyle_profile. - Re-ran
preflightsuccessfully:- run dir:
data/smoke_run_20260331_184401 - verify artifact:
smoke_verify_smoke_run_20260331_184401.json - final authority:
overall=PASS
- run dir:
- Re-ran
fullsuccessfully:- run dir:
data/smoke_run_20260331_184605 - verify artifact:
smoke_verify_smoke_run_20260331_184605.json - final authority:
overall=PASS
- run dir:
- Live smoke behavior on the 10-row baseline is now characterized:
target_lang_effectivestayeden-US- no RU fallback was triggered
- row counts stayed aligned at
10 input / 10 translated / 10 final QA Hardproduced5initial errors, all cleared byRepair HardSoft QAproduced8repair tasks;Repair Softrepaired7and escalated1
- The pipeline is therefore runnable end-to-end on fresh
main, but not perfectly clean:- manifest
gate_summary.statusispassed - manifest
overall_statusiswarn - one review handoff remains queued for
string_id=10007436
- manifest
- Closed the live-smoke execution slice by retiring the exploration subagent after integrating its artifact/PASS-authority findings.
- Started a dedicated human UAT prep scope for the current merged Phase 5 + 6 UI surface.
- Confirmed
phase6_dashboard_worktreeis the correct target for human acceptance; localmainis behind remote and not suitable for this pass. - Added
scripts/seed_phase6_manual_uat.pyto create deterministic workspace/runtime data for manual browser validation. - Added
tests/test_seed_phase6_manual_uat.pyto lock the seed utility shape. - Added a dated human UAT checklist and PLC prep records under
docs/project_lifecycle/run_records/2026-03/2026-03-28/. - Re-ran the targeted human-UAT prep checks successfully:
python -m pytest tests/test_phase5_acceptance_gate.py -q->1 passedpython -m pytest tests/test_phase6_acceptance_gate.py -q->1 passedpython -m pytest tests/test_seed_phase6_manual_uat.py -q->1 passed
- Seeded deterministic manual-UAT data successfully:
phase6_manual_uat_derivedphase6_manual_uat_persisted
- Verified live-launch env readiness:
python scripts/llm_ping.py->SUCCESS / PONG
- Started the local UI server successfully on
http://127.0.0.1:8765/. - Verified the running server sees both seeded runs through
/api/workspace/overview. - Remaining work in this scope is now only the human browser walkthrough and evidence capture.
- Started the bounded Phase 6 acceptance pass instead of extending implementation scope.
- Added
tests/test_phase6_acceptance_gate.pyto exercise the documentedpython scripts/operator_ui_server.pyentrypoint with live HTTP rather than import-only checks. - The new acceptance gate verifies:
/renders bothRuntime ShellandOperator Workspace/api/workspace/overview,/api/workspace/cards, and/api/workspace/runs/{run_id}work against fixture-backed runs- derived workspace reads stay side-effect free and do not create persisted operator artifacts
- persisted
operator_cards/operator_summaryare still honored when present - runtime drilldown and manifest-scoped artifact preview still work through
/api/runs*
- Re-ran the retained Phase 4/5/6 + governance regression floor successfully:
python -m pytest tests/test_phase4_operator_control_plane.py tests/test_operator_ui_models.py tests/test_operator_ui_workspace_models.py tests/test_operator_ui_launcher.py tests/test_operator_ui_server.py tests/test_operator_ui_workspace_server.py tests/test_phase5_frontend_runtime_shell.py tests/test_phase5_acceptance_gate.py tests/test_phase6_operator_workspace_dashboard.py tests/test_phase6_acceptance_gate.py tests/test_smoke_verify.py tests/test_runtime_adapter_contract.py tests/test_batch6_repair_metrics_contract.py tests/test_validation_contract.py tests/test_qa_hard.py tests/test_script_authority.py tests/test_batch3_batch4_governance.py tests/test_plc_docs_contract.py -q->79 passed
- Re-ran PLC governance validation successfully:
python scripts/plc_validate_records.py --preset representative --preset templates->Validated 11 PLC governance artifact(s).
- Shifted
.triadev/state.jsonand.triadev/workflow.jsonfromvalidation_pendingtoaccepted_pending_pr_closeout. - Current roadmap distance is now one closeout step:
- product acceptance is complete
- remaining work is GitHub PR review absorption and merge for PR #20
- Started PLC + TriadDev integration-priority pass before milestone E.
- Confirmed the real GitHub integration branch is
codex/plc-c-verifyinD:\Dev_Env\GPT_Codex_Workspace, whilegame-localization-mvr/main_worktreeis a nested reference worktree with half-applied fixes. - Confirmed PLC governance state already marks milestones C and D as
donewithevidence_ready=true, but GitHubmainhas not yet absorbed that state. - Audited open PRs and selected PR #9 as the only viable mainline integration branch; PR #7 and PR #8 are superseded in scope.
- Extracted the current blocking review set:
soft_qa_llm.pyseverity loss inmerge_tasks()- missing
prohibited_aliases/banned_termspropagation in translation and soft-QA contracts - PLC ledger/schema inconsistencies in milestone-B evidence
- Chose a minimum validation plan: targeted
soft_qacontract tests, a small translation style-contract test, and file-level PLC contract checks. - Updated the execution ledger with an explicit
PLC + TriadDev Integration Prioritysection so this pass stays bounded to integration hardening rather than milestone E feature work. - Fixed PR #9 code-review gaps in the outer integration repo:
soft_qa_llmnow prioritizes higher-severity placeholder findings and surfacesprohibited_aliases/banned_termstranslate_llmnow serializesprohibited_aliases/banned_termsin the style contract output
- Fixed PLC governance gaps in the outer integration repo:
- milestone B run manifest now uses schema-valid status
pass - referenced ADR files now exist under
docs/decisions/ - added a PLC docs contract test to keep run-manifest schema and ADR references honest
- milestone B run manifest now uses schema-valid status
- Ran the targeted regression suite successfully:
tests/test_soft_qa_contract.py:7 passedtests/test_translate_style_contract.py:1 passedtests/test_plc_docs_contract.py:2 passed
- Merged PR #9 into
mainasfdc253f. - Closed PR #7 and PR #8 as superseded by PR #9.
- Current state: mainline integration phase is complete; milestone E can now start from clean
main. - Opened clean worktree
D:\Dev_Env\GPT_Codex_Workspace_milestone_eon branchcodex/milestone-e-prepare. - Fast-forwarded the E worktree to include the post-merge PLC handoff commit.
- Shifted the active planning scope to
milestone_E_prepare; next step is E planning/delta/tasks preparation rather than more PR cleanup.
- Started milestone E implementation from
codex/milestone-e-prepareusing the package orderE-contract -> E-repro + E-delta-engine -> E-task-executor. - Confirmed the E worktree is clean, but the control plane is stale:
.triadev/state.jsonalready saysmilestone_e_prepare.triadev/workflow.jsonstill points at the old Batch 10 closeout change
- Confirmed the current clean worktree does not contain
data/style_profile.yamlordata/glossary.yaml; this is now a first-classE-reproblocker rather than an implicit local-state assumption. - Confirmed
scripts/glossary_delta.pyandscripts/translate_refresh.pyare present but still implement a narrow glossary-only refresh path that does not satisfy milestone E. - Confirmed current regression status before E implementation:
tests/test_translate_style_contract.py: passtests/test_plc_docs_contract.py: passtests/test_soft_qa_contract.py: 1 failing test due to style-profile drift semantics
- Locked the E gate artifact in
workflow/milestone_e_contract.yamland moved the active ledger from planning-only E to implementation-gated E. - Completed the first parallel implementation wave after the gate:
E-repronow resolves glossary/style authority explicitly, supports clean-worktree bootstrap, and aligns README/workflow examples with live CLI flags.E-delta-enginenow emits locale-generic typed delta artifacts and operator-facing aggregate reports instead of a glossary-only impact set.
- Moved the active package to
E-task-executor; the remaining work is to generate incremental tasks fromdelta_rows.jsonl, split execution from planning, and enforce post-runqa_hardgates. - Closed the reviewer blocker pass before phase-2 closeout:
- executor now stages candidate output before gates and writes an explicit failure-breakdown artifact
- executor now groups refresh/retranslate work by
target_locale, so mixed-market rows update the correct locale columns - glossary/style loaders now fail closed for locale mismatches instead of silently borrowing another market's term
- Updated the E contract to match the implemented surface:
- removed the unimplemented
soft_qatask type from the E task enum - pinned the executor failure artifact as
incremental_failure_breakdown.json
- removed the unimplemented
- Milestone E focused regression is green again:
27 passedacross refresh/executor, repro, typed delta, soft-QA compatibility, translate style contract, and PLC docs contract tests
- Added a post-E roadmap modify proposal to PLC/TriadDev docs:
F → Snow has an explicit four-phase interpretation while preserving the original milestone letters- recommended next main scope is
milestone_F_execute - recommended governance sidecar is
milestone_M_prepare
- Switched to stacked branch
codex/phase1-quality-closureto start the first Phase 1 implementation slice. - Locked the Phase 1 slice to
translate_refreshunified execution-status contracts only:- no
run_smoke_pipelineorchestration changes in this round - no
soft_qa/repair_loopruntime wiring in this round
- no
- Current validation plan is focused tests, not smoke:
- rationale: this slice changes executor artifact semantics but intentionally leaves the smoke entrypoint untouched
- Implemented the Phase 1 status-contract slice in
translate_refresh:- task artifacts now persist
execution_status,final_status, andstatus_reason - manifest artifacts now persist
overall_status,task_outcomes, andgate_summary - review queue rows now persist
review_source
- task artifacts now persist
- Closed the last Phase 1 contract gap in the main thread:
- execution failures now keep the staged candidate artifact and return a non-zero exit code instead of silently promoting final output
- Phase 1 focused acceptance is green:
python -m pytest tests/test_translate_refresh_contract.py tests/test_milestone_e_e2e.py tests/test_plc_docs_contract.py -q- result:
10 passed
- Smoke remains intentionally skipped for this slice because
scripts/run_smoke_pipeline.pyis unchanged and orchestration behavior is out of scope for this PR.
- Confirmed PR #10 and PR #11 are merged and
origin/mainnow contains both the milestone E baseline and Phase 1 quality-closure follow-up. - Shifted the active roadmap scope from the merged
milestone_F_executeslice tomilestone_M_prepareon branchcodex/phase2-governance-substrate. - Chose a bounded Phase 2 first package instead of trying to execute all of
M/N/O/Pat once:- freeze a machine-checkable governance contract for
run_manifest,session_start,session_end, andmilestone_state - add a validator utility for those artifacts
- extend PLC docs regression to lock representative records and templates to the same contract
- freeze a machine-checkable governance contract for
- Validation plan for this slice is focused governance tests only:
- rationale: Phase 2 first package changes documentation contracts and validator code, not runtime translation orchestration
- Completed the first Phase 2 governance substrate package:
- added
workflow/plc_governance_contract.yamlas the machine-checkable contract source - added
scripts/plc_validate_records.pyas the repo-local validator - expanded
tests/test_plc_docs_contract.pyto validate templates, representative records, and preset-based validator runs
- added
- Synced the human-facing governance docs to the same contract language:
field_schema.mdsession_start_template.mdsession_end_template.mdmilestone_state_template.mdcontinuity_protocol.md
- Focused Phase 2 acceptance is green:
python -m pytest tests/test_plc_docs_contract.py -q->7 passedpython scripts/plc_validate_records.py --preset representative --preset templates->Validated 7 PLC governance artifact(s).
- Smoke remains intentionally skipped for this slice because the runtime pipeline and orchestrator are untouched.
- Started the Phase 2 closeout package on
codex/phase2-governance-closeoutto finish the remainingO + Psubstrate work. - Expanded the governance target from “first bounded package” to “phase-complete closeout”:
- machine-checkable three-point validation for
changed_files,evidence_refs, andadr_refs - closeout-grade representative records for session, run manifest, and milestone state
- Phase 3 stays planning-ready only until a later implementation-start decision
- machine-checkable three-point validation for
- Completed the Phase 2 closeout package:
- aligned
workflow/plc_governance_contract.yaml,field_schema.md,continuity_protocol.md, and the PLC templates to the same three-point governance semantics - upgraded representative PLC records so run/session/milestone artifacts all carry
changed_files,evidence_refs, andadr_refs - closed
milestone_state_M.mdwithstatus=doneandevidence_ready=true
- aligned
- Focused closeout acceptance is green:
python -m pytest tests/test_plc_docs_contract.py -q->9 passedpython scripts/plc_validate_records.py --preset representative --preset templates->Validated 7 PLC governance artifact(s).
- TriadDev control plane is now aligned to
phase3_planning_ready; Phase 3 remains planning-only, not implementation-started. - Confirmed PR #13 is merged and moved the active branch to
codex/milestone-i-preparefrom cleanmain. - Started
milestone_I_prepareas a planning-only Phase 3 slice:- active target is a bounded style-governance contract package
- runtime implementation remains gated until
Hcompletes
- Recorded a fresh set of Phase 3 PLC artifacts:
phase3_milestone_i_prepare_note.mdrun_manifest_phase3_milestone_i_prepare.jsonsession_start_20260325_phase3_milestone_i_prepare.mdsession_end_20260325_phase3_milestone_i_prepare.mdmilestone_state_I.md
- Focused Phase 3 planning acceptance is green:
python -m pytest tests/test_plc_docs_contract.py -q- record-level validation of the new run manifest, session start, session end, and milestone state under
scripts/plc_validate_records.py
- Merged PR #14 into
mainand reopened Phase 3 from clean trunk oncodex/milestone-i-contract-package. - Completed the first milestone-I implementation package:
- added
workflow/style_governance_contract.yaml - added style-governance metadata and lineage to
data/style_profile.yaml - updated
scripts/style_guide_bootstrap.pyandscripts/style_sync_check.pyto emit and validate the governance header - synced the version/governance header into
workflow/style_guide.generated.md,workflow/style_guide.md, and.agent/workflows/style-guide.md - added
tests/test_style_governance_contract.py
- added
- Focused milestone-I contract acceptance is green:
python -m pytest tests/test_style_governance_contract.py tests/test_translate_style_contract.py tests/test_soft_qa_contract.py -q->12 passedpython scripts/style_sync_check.py->pass
- Treated merged PR #15 as a bridge foundation only and returned the active execution lane to Phase 1 on fresh
main. - Opened
codex/phase1-quality-runtime-closeoutas the single active implementation branch under the phase-sized merge-window policy. - Completed the remaining Phase 1 runtime closure in
scripts/run_smoke_pipeline.py:- hard QA now routes through
repair_loopwith explicit recheck and blocked-state handling - soft QA now routes through bounded repair, fail-closed hard-gate review handoff, and rollback-safe promotion
- smoke manifests now persist
repair_cycles,review_handoff,gate_summary, anddelivery_decision
- hard QA now routes through
- Added focused Phase 1 runtime contract coverage:
tests/test_phase1_quality_runtime_contract.pynow locks hard-repair completion, soft rollback, and soft hard-gate-without-tasks handofftests/test_batch6_repair_metrics_contract.pynow carries explicit style-profile and soft-QA rubric inputs for smoke orchestration teststests/test_repair_loop_contract.pykeeps CLI doc authority focused on the repair workflow itself
- Phase 1 runtime acceptance is green again:
python -m py_compile scripts/run_smoke_pipeline.pypython -m pytest tests/test_batch6_repair_metrics_contract.py tests/test_phase1_quality_runtime_contract.py tests/test_repair_loop_contract.py tests/test_soft_qa_contract.py tests/test_smoke_verify.py -q->29 passedpython -m pytest tests/test_translate_refresh_contract.py tests/test_milestone_e_e2e.py -q->10 passed
- PLC/TriadDev phase-boundary records now validate for the new Phase 1 run/session/milestone artifacts.
- Merged PR #16 into
mainas3a84f55, closing the full Phase 1 large-batch runtime scope. - Phase 1 review feedback is fully absorbed:
- early-fail smoke manifests now report
failedcorrectly - non-
ru-RUreview handoff rows keep current translated text - representative PLC milestone records now point to
milestone_state_H.md
- early-fail smoke manifests now report
- Re-ran post-review acceptance successfully:
- focused runtime:
31 passed - focused executor + PLC docs:
21 passed - PLC validator presets:
Validated 11 PLC governance artifact(s).
- focused runtime:
- Current roadmap decision is now Phase 3, not Phase 4:
- Phase 2 is already complete
His merged, which removes the documented gate for broaderI/J/K/L- the milestone-I bridge package is already on
mainas foundation
- Opened a new value-first gate for the next batch and scored the full Phase 3 batch
GO (25/30, High confidence). - Opened
codex/phase3-language-governance-batchfrom cleanmainand moved Phase 3 from planning into implementation. - Frozen Phase 3 shared contracts/helpers before downstream wiring:
- review ticket / feedback log / lifecycle / KPI contracts
scripts/style_governance_runtime.pyscripts/review_governance.pyscripts/review_feedback_ingest.py
- Active implementation split is now:
- runtime style-governance enforcement in translate + soft QA
- review ticket / feedback / lifecycle / KPI wiring in refresh + smoke pipeline
- Completed the shared Phase 3 governance helper layer:
scripts/style_governance_runtime.pyscripts/review_governance.pyscripts/review_feedback_ingest.pyscripts/language_governance.pyas a thin compatibility wrapper over the new helper/contract surfaces
- Completed runtime consumer integration for the Phase 3 batch:
translate_llm.pyandsoft_qa_llm.pynow fail closed on governed style-profile violationstranslate_refresh.pynow emits review tickets, feedback-log placeholders, lifecycle-aware KPI artifacts, and governed review handoffrun_smoke_pipeline.pynow emits the same Phase 3 review / KPI artifacts without breaking the Phase 1 orchestration contract
- Phase 3 focused acceptance is green:
python -m pytest tests/test_phase3_governance_helpers.py tests/test_phase3_runtime_governance.py tests/test_phase3_language_governance_contract.py tests/test_translate_refresh_contract.py tests/test_phase1_quality_runtime_contract.py tests/test_translate_style_contract.py tests/test_soft_qa_contract.py tests/test_plc_docs_contract.py -q->44 passedpython scripts/style_sync_check.py->passpython scripts/plc_validate_records.py --preset representative --preset templates->Validated 11 PLC governance artifact(s).
- Live smoke feasibility was checked and blocked by environment only:
python scripts/llm_ping.pyfailed becauseLLM_BASE_URL/LLM_API_KEYare missing in the current shell- merge acceptance therefore uses the required representative smoke gate via deterministic orchestration coverage in
tests/test_phase1_quality_runtime_contract.py
- Current branch status:
codex/phase3-language-governance-batchis implementation-complete- PR #17 is open:
feat(phase3): land language governance batch - the next step is review absorption and merge, not more Phase 3 implementation
- Confirmed PR #17 is merged into
mainas88e9dba; Phase 3 is now closed history, not the active execution lane. - Opened
codex/phase4-operator-control-plane-batchfrom cleanmain. - Started Phase 4 as one phase-sized batch with bridge hardening included:
repair_looptarget-column detection now excludes locale/language metadata columnslanguage_governanceno longer falls back to the default lifecycle registry when an explicit caller registry is incomplete
- Added the Phase 4 operator control plane surface:
workflow/operator_card_contract.yamlscripts/operator_control_plane.pytests/test_phase4_operator_control_plane.py
- Accepted the operating-model ADR:
docs/decisions/ADR-0003-operator-control-plane-operating-model.md
- Focused bridge/operator acceptance is green:
python -m pytest tests/test_repair_loop_contract.py tests/test_phase3_language_governance_contract.py tests/test_phase4_operator_control_plane.py -q->22 passedpython -m py_compile scripts/operator_control_plane.py scripts/repair_loop.py scripts/language_governance.py
- Remaining work in this phase is to:
- sync PLC/TriadDev control-plane state to Phase 4
- materialize the representative operator cards/report walkthrough from an existing run
- run full focused acceptance and open one Phase 4 PR
- Started M4 execution task for the 1000-row layered smoke input.
- Created
task_plan.md,findings.md, andprogress.md. - Ran direct
llm_pingsuccessfully with the provided LLM credentials. - Ran exact-input preflight and full pipeline attempts; both short-circuited at connectivity inside
run_smoke_pipeline.py. - Older full run on a related 1000-row artifact reached translation and QA Hard, then failed with
85QA errors andTranslated 1002 / 1003 rows.
- Created and pushed checkpoint branch
codex/checkpoint-mainline-20260319. - Switched to
codex/deep-cleanup-r3for deep-cleanup Batch 1. - Materialized TriadDev brownfield control files and value gate artifacts.
- Added script authority manifest/report tooling for
main_worktree/scriptsvssrc/scripts. - Expanded Batch 1 runtime adapter coverage and added unit tests for the authority checker.
- Ran Batch 1 regression suite successfully:
29 passed. - Re-ran
m4_3_collect_coverage.pyandm4_4_decision.py; the decision summary remainsKEEP=6. - Current authority report is
WARNbecauseruntime_adapter.pyis still alert-only drift, while required mirrors remain aligned. - Started Batch 2 under the first-principles rule: preserve the smallest system needed for continued development, do not delete uncertain code.
- Added Batch 2 contract tests for
runtime_adapter,normalize_*, andsoft_qa_llm. - Fixed explicit-router injection in
runtime_adapter.LLMClient. - Moved import-time standard-stream rewiring out of
normalize_tagger.py,normalize_tag_llm.py,translate_llm.py, andsoft_qa_llm.pyinto CLI-time configuration. - Fixed
soft_qa_llm.py --dry-runto usebatch_utils.SplitBatchConfigandsplit_into_batches. - Batch 2 focused test surface is green:
14 passed. - Started roadmap Phase 1 Batch 3/4 to convert near-core status decisions and frozen-zone boundaries into explicit governance artifacts.
- Collected branch-topology evidence for the later GitHub cleanup phase:
several remote branches are fully contained in
origin/main, whilereorg/v1.3.0-structureremains the only audit-first diverged branch. - Added
workflow/batch3_surface_inventory.jsonandworkflow/batch4_frozen_zone_inventory.jsonplusreports/github_branch_audit_20260319.mdto make Phase 1 and Phase 2 decisions auditable. - Added
tests/test_batch3_batch4_governance.pyto lock wrapper forwarding, CLI compatibility, and governance status expectations. - Refined surface statuses:
normalize_ingest.pyis nowcompat-keep documented ingest,normalize_tag_llm.pyis nowstress-only compat entrypoint. - Fixed
scripts/stress_test_3k_run.shso soft QA writes--out_reportand--out_tasks, and the soft repair loop consumes the emitted tasks JSONL instead of the report JSON. - Phase 1 regression plus evidence gate is green again:
50 passed, authority remainsWARNonruntime_adapter.pyalert-only drift, and M4 remains atKEEP=6. - Started Batch 5 on branch
codex/deep-cleanup-batch5after localmain_worktreewas re-aligned withorigin/mainand GitHub governance was fully closed out. - Added
tests/test_batch5_archive_candidates.pyto characterize the archived CLI shape ofrepair_loop_v2.pyand the hard-coded recovery behavior ofrepair_checkpoint_gaps.py. - Independent subagent review found hidden dependency blockers before archive could be
finalized: active rules/root inventory still mention
repair_loop_v2.py, andrepair_checkpoint_gaps.pystill participates in the documented translate-checkpoint recovery contract. - Rolled the archive action back immediately, restored both files to
scripts/, and converted Batch 5 into an audit-and-fallback step instead of a physical cleanup step. - Updated the cleanup roadmap to treat Batch 5 as the point where these two repair-side
utilities move from
archive-candidatetoblockeduntil their surrounding contracts are formally retired. - Batch 5 regression and evidence gate are green:
56 passed, authority is back toWARNonruntime_adapter.pyonly after required compat mirrors were resynced, andM4_4_decision.jsonlremainsKEEP=6. - Started Batch 6 on branch
codex/deep-cleanup-batch6to retire repair-side governance contracts before any future archive attempt and to restore smoke metrics as optional observability. - Rewrote the active rules, root inventory, and translate workflow so
repair_loop_v2.pyandrepair_checkpoint_gaps.pyare no longer presented as current tools. - Downgraded both repair-side targets from
blockedback toarchive-candidateinworkflow/batch4_frozen_zone_inventory.json; Batch 6 still does not physically archive them. - Reconnected
scripts/metrics_aggregator.pyinsidescripts/run_smoke_pipeline.pyas a non-blocking Metrics stage that writes manifest-visible report artifacts before verify. - Extended
scripts/metrics_aggregator.pywith usage fallback based on trace token fields and char-count estimation so sparse traces still produce stable totals and cost estimates. - Added
tests/test_batch6_repair_metrics_contract.pyand turned the initial RED surface green:7 passed. - Ran the full Batch 6 regression suite plus evidence gate successfully:
63 passed,scripts/check_script_authority.pyreturnedWARNonruntime_adapter.pyonly, andscripts/m4_4_decision.pystill reportsKEEP=6. - Re-synced
src/scripts/run_smoke_pipeline.pyfrom the authority copy after the new Metrics stage introduced required-mirror drift; authority returned to the expected non-blocking state immediately afterward. - Started Batch 7 on branch
codex/deep-cleanup-batch7with the new top-level priority: restore sustainable production development rather than continue chasing script deletion. - Added and expanded deterministic validation coverage in
tests/test_validation_contract.pyfor explicit CLI paths, scoring, parse fallback, and metadata/report schema. - Added and expanded deterministic repair coverage in
tests/test_repair_loop_contract.pyfor hard-report JSON input, soft JSONL input, passthrough copy behavior, routing metadata, and runbook alignment. - Updated
docs/repro_baseline.mdso validation commands now prefer explicit--input,--output-dir,--report-dir, and--api-key-pathflags, and switched the retained credential example away from the driftedconfig/api_key.txtpath. - Updated
docs/WORKSPACE_RULES.mdso repair metadata steps now match the runtime truth (repair_hard/repair_soft_major) and checkpoint behavior is documented as snapshot-only rather than true resume support. - Promoted
scripts/run_validation.pyandscripts/build_validation_set.pytomust-keepinworkflow/batch4_frozen_zone_inventory.json;scripts/repair_loop.pyis being promoted in the same inventory as the retained repair authority. - Ran focused Batch 7 contract tests successfully:
tests/test_validation_contract.py,tests/test_repair_loop_contract.py, andtests/test_batch3_batch4_governance.pyare green. - Ran the full Batch 7 regression suite successfully:
77 passed. - Re-ran the evidence gate successfully:
scripts/check_script_authority.pyremainsWARNonruntime_adapter.pyonly,scripts/m4_3_collect_coverage.pyreports0issue hotspots, andscripts/m4_4_decision.pystill reportsKEEP=6. - Committed Batch 7 as
ddd14e2(cleanup(batch7): recover production dev baselines) and pushed branchorigin/codex/deep-cleanup-batch7for PR review. - Started Batch 8 on branch
codex/deep-cleanup-batch8. - Confirmed the retained mainline paths remain
scripts/repair_loop.pyandscripts/rebuild_checkpoint.py, whilerepair_loop_v2.pyandrepair_checkpoint_gaps.pyare only historical archive targets now. - Added Batch 8 characterization coverage for physical archive closeout and relaxed Batch 5/6 tests so they continue to validate historical evidence after the move.
- Physically moved
repair_loop_v2.pyandrepair_checkpoint_gaps.pyinto_obsolete/repair_archive/and added an audit README plus Batch 8 closeout report. - Ran focused Batch 8 archive-closeout coverage successfully:
17 passed. - Ran the full Batch 8 regression suite successfully:
81 passed. - Re-ran the evidence gate successfully:
scripts/check_script_authority.pyremainsWARNonruntime_adapter.pyonly,scripts/m4_3_collect_coverage.pyreports0issue hotspots, andscripts/m4_4_decision.pystill reportsKEEP=6. - Committed Batch 8 as
3ae4fac(cleanup(batch8): close out repair archive migration), pushed branchorigin/codex/deep-cleanup-batch8, and opened PR #3. - Started Batch 9 on branch
codex/deep-cleanup-batch9. - Added
tests/test_batch9_stress_surface_governance.pyto pin one retained stress shell path, explicit helper statuses, and the removal of the generic blocked stress bucket. - Reclassified the stress surface in
workflow/batch4_frozen_zone_inventory.jsonfrom one blocked umbrella to explicit shell/helper statuses. - Updated
workflow/batch3_surface_inventory.jsonso retained near-core references now point toscripts/stress_test_3k_run.shrather than historical 5k acceptance helpers. - Fixed the retained stress shell export invocation in
scripts/stress_test_3k_run.shso it now matches the current positionalrehydrate_export.pycontract. - Started Batch 10 on branch
codex/deep-cleanup-batch10as a stacked closeout branch on top of Batch 9 while PR #4 remains open. - Reframed
src/scriptsfrom vague long-tail compat noise into an explicitseparate-exit-programcompatibility liability in the authority manifest and frozen-zone inventory. - Removed the non-real
gate/**placeholder from the frozen-zone inventory so closure is judged only against real surfaces. - Added Batch 10 governance coverage for closeout decision semantics, root inventory wording, and the requirement that no real blocked surface remains in the cleanup inventory.
- Ran focused Batch 10 governance tests successfully:
15 passed. - Ran the full retained regression suite successfully with Batch 10 included:
92 passed. - Re-ran the evidence gate successfully:
scripts/check_script_authority.pyremainsWARNonruntime_adapter.pyonly,scripts/m4_3_collect_coverage.pyreports0issue hotspots, andscripts/m4_4_decision.pystill reportsKEEP=6. - Batch 10 therefore closes the current cleanup roadmap:
src/scriptsremains operationally present, but any future mirror retirement now moves into a separate migration program instead of staying on the main cleanup path. - Ran focused Batch 9 governance regression successfully:
42 passed. - Re-ran the evidence gate successfully:
scripts/check_script_authority.pyremainsWARNonruntime_adapter.pyonly,scripts/m4_3_collect_coverage.pyreports0issue hotspots, andscripts/m4_4_decision.pystill reportsKEEP=6. - Re-ran the full explicit test-file suite successfully with
-sto avoid the existing pytest capture issue in this workspace:98 passed, 8 skipped. - Batch 9 now marks the roadmap as closeout-ready: Batch 10 should focus only on the
long-term decision for
src/scriptscompat mirror rather than opening new cleanup surfaces.
- Re-opened Phase 5 on
codex/phase5-frontend-runtime-shellto finish acceptance and PR closeout rather than starting Phase 6. - Verified the provided LLM credentials with
python scripts/llm_ping.py; connectivity passed and returnedPONG. - Confirmed PR #19 remains open and currently merge-blocked by one branch conflict plus four unresolved review threads.
- Fixed the operator UI frontend contract so detail rendering now consumes
run.stagesandrun.verify. - Fixed operator UI launcher run-id generation so same-second launches produce unique run IDs and run directories.
- Hardened the documented
python scripts/operator_ui_server.pyentrypoint with repo-root import bootstrapping and frontend asset fallback. - Added/extended closeout coverage:
tests/test_operator_ui_launcher.py,tests/test_operator_ui_server.py,tests/test_phase5_frontend_runtime_shell.py, andtests/test_phase5_acceptance_gate.py. - Re-ran the full retained Phase 5 regression floor successfully:
39 passedplus14 passed. - Ran a real representative online
preflightlaunch through the local UI server usingD:\Dev_Env\loc-mvr 测试文档\test_input_200-row.csv; the launched run wasui_run_20260327_052512_477292_22c1,run_manifest.jsonwas produced, the UI/API exposed7stages,verifyreturnedPASS, andsmoke_verify_logpreview was available through the artifact endpoint. - Started Phase 6 implementation on fresh
mainbranchcodex/phase6-operator-workspace-dashboard. - Split
scripts/operator_control_plane.pyinto a pure derivation path and retained write-on-demand summarize path so workspace GET requests stay side-effect free. - Extended
scripts/operator_ui_models.pywith workspace overview, card-list, and run-detail read models that prefer persisted operator artifacts and fall back to derived payloads. - Extended
scripts/operator_ui_server.pywith:GET /api/workspace/overview,GET /api/workspace/cards, andGET /api/workspace/runs/{run_id}. - Reworked
operator_ui/index.html,operator_ui/styles.css, andoperator_ui/app.jsso the local shell now supportsRuntime ShellandOperator Workspacemodes in one page. - Added Phase 6 RED/acceptance coverage in:
tests/test_operator_ui_workspace_models.py,tests/test_operator_ui_workspace_server.py, andtests/test_phase6_operator_workspace_dashboard.py. - Ran the focused Phase 6 + retained regression floor successfully:
78 passed. - Re-ran PLC governance validation successfully:
python scripts/plc_validate_records.py --preset representative --preset templates->Validated 11 PLC governance artifact(s).