A robust, automated workflow system for game localization with strict validation, AI translation/repair, glossary management, and multi-format export.
Core Principle: Input rows == Output rows ALWAYS. No silent data loss.
Quick Commands for Agents:
# 1. Verify LLM connectivity (MUST run first)
python scripts/llm_ping.py
# 2. Validate workflow configuration (dry-run)
python scripts/translate_llm.py input.csv output.csv workflow/style_guide.md glossary/compiled.yaml --dry-run
# 3. Run E2E test
python scripts/test_e2e_workflow.pyEnvironment Variables (REQUIRED):
LLM_BASE_URL=https://api.example.com/v1
LLM_API_KEY=sk-your-key
LLM_MODEL=gpt-4.1-mini
LLM_TRACE_PATH=data/llm_trace.jsonlKey Rules for Agents:
- Never hardcode API keys - Use environment variables only
- Run
llm_ping.pyfirst - Fail-fast if LLM unavailable - Check WORKSPACE_RULES.md - See
docs/WORKSPACE_RULES.mdfor hard constraints - Row preservation is P0 - Empty source rows must be preserved with
status=skipped_empty - Glossary is mandatory -
glossary/compiled.yamlmust exist before translation
Input CSV → Normalize → Translate → QA_Hard → Repair → Export
↓
Glossary (required)
| Step | Script | Purpose | Blocking? |
|---|---|---|---|
| 0 | llm_ping.py |
🔌 LLM connectivity check | YES |
| 1 | normalize_guard.py |
🧊 Freeze placeholders → tokens | YES |
| 2-4 | extract_terms.py → glossary_compile.py |
📖 Build glossary | YES |
| 5 | translate_llm.py |
🤖 AI Translation | YES |
| 6 | qa_hard.py |
🛡️ Validate tokens/patterns | YES |
| 7 | repair_loop.py |
🔧 Auto-repair hard errors | - |
| 8 | soft_qa_llm.py |
🧠 Quality review | - |
| 10 | rehydrate_export.py |
💧 Restore tokens → placeholders | YES |
loc-mvr/
├── config/
│ ├── llm_routing.yaml # Model routing per step
│ └── pricing.yaml # Cost calculation
├── glossary/
│ ├── compiled.yaml # Active glossary (generated)
│ └── generic_terms_zh.txt # Blacklist for extraction
├── scripts/
│ ├── llm_ping.py # ★ Run first - connectivity check
│ ├── normalize_guard.py # Step 1: Placeholder freezing
│ ├── translate_llm.py # Step 5: Translation
│ ├── qa_hard.py # Step 6: Hard validation
│ ├── repair_loop.py # Step 7: Auto-repair
│ └── runtime_adapter.py # LLM client with routing
├── workflow/
│ ├── style_guide.md # Translation style rules
│ ├── forbidden_patterns.txt
│ └── placeholder_schema.yaml
└── docs/
└── WORKSPACE_RULES.md # ★ Hard constraints for agents
git clone https://github.com/Charpup/game-localization-mvr.git
cd game-localization-mvr
pip install pyyaml requests numpy pandas jieba# Windows PowerShell
$env:LLM_BASE_URL="https://api.apiyi.com/v1"
$env:LLM_API_KEY="sk-your-key"
$env:LLM_MODEL="gpt-4.1-mini"也可在本地持久化文件中配置(优先于环境变量自动读取):
# 在 main_worktree/.llm_credentials 创建
LLM_BASE_URL=https://api.apiyi.com/v1
LLM_API_KEY=sk-your-key
当前加载顺序:LLM_API_KEY_FILE -> ./.llm_credentials/./.llm_env/./config/llm_credentials.env/~/.game-localization-mvr/.llm_credentials -> LLM_API_KEY
python - <<'PY'
import os
import importlib
for pkg in ["requests", "numpy", "yaml", "pandas"]:
try:
importlib.import_module(pkg)
print(f"[OK] {pkg}")
except Exception:
print(f"[MISSING] {pkg}")
for key in ["LLM_BASE_URL", "LLM_API_KEY", "LLM_MODEL"]:
print(f"{key}={'SET' if os.getenv(key) else 'MISSING'}")
PYIf any dependency shows MISSING or env variable shows MISSING, do not start smoke run yet.
PowerShell 快速检查:
$missing = @()
foreach ($m in @("requests","numpy","yaml","pandas","jieba")) {
try {
python -c "import importlib.util; print(bool(importlib.util.find_spec('$m')))"
Write-Host "[OK] $m"
} catch {
$missing += $m
Write-Host "[MISSING] $m"
}
}
Write-Host "LLM_BASE_URL=$([bool]$env:LLM_BASE_URL)"
Write-Host "LLM_API_KEY=$([bool]$env:LLM_API_KEY)"
Write-Host "LLM_MODEL=$([bool]$env:LLM_MODEL)"# Verify LLM
python scripts/llm_ping.py
# Normalize → Translate → QA → Export
python scripts/normalize_guard.py input.csv normalized.csv map.json workflow/placeholder_schema.yaml
python scripts/translate_llm.py normalized.csv translated.csv workflow/style_guide.md glossary/compiled.yaml
python scripts/qa_hard.py translated.csv qa_report.json map.json
python scripts/rehydrate_export.py translated.csv map.json final.csv# Full smoke pass with manifest output + issue recording
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr 测试文档\\test_input_200-row.csv" --target-lang en-US
# 可选:仅做预检
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr 测试文档\\test_input_200-row.csv" --target-lang en-US --verify-mode preflightThis command:
- runs
llm_ping -> normalize_guard -> translate_llm -> qa_hard -> rehydrate_export - generates a run manifest:
data/smoke_run_<timestamp>/run_manifest.json - runs
smoke_verify --manifest ... - records issues to
reports/smoke_issues_<run-id>.jsonand.jsonl - emits
manifest.stage_artifactswith:connectivity_lognormalize_logtranslate_logqa_hard_reportfinal_csvsmoke_verify_log
verify_modesupportspreflight|full,默认full(含行数/QA 统计)
建议每次冒烟固定检查以下产物:
D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\data\smoke_runs\<run>\run_manifest.jsonD:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_issues_<run_id>.jsonD:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_verify_<run_id>.json
- Row Preservation: Empty rows kept with
status=skipped_empty - Drift Guard: Refresh stage blocks non-placeholder text changes
- Progress Reporting:
--progress_every Nfor translation progress - Router-based Models: Configure per-step models in
llm_routing.yaml - LLM Tracing: All calls logged to
LLM_TRACE_PATHfor billing
# Unit tests
python scripts/test_normalize.py
python scripts/test_qa_hard.py
python scripts/test_rehydrate.py
# E2E test (small dataset)
python scripts/test_e2e_workflow.py
# Dry-run validation
python scripts/translate_llm.py input.csv out.csv style.md glossary.yaml --dry-runMIT License. Built for game localization automation.
- Workspace Rules: docs/WORKSPACE_RULES.md
- Demo Walkthrough: docs/demo.md