Game Localization MVR (Minimum Viable Rules) v2.1

A robust, automated workflow system for game localization with strict validation, AI translation/repair, glossary management, and multi-format export.

Core Principle: Input rows == Output rows ALWAYS. No silent data loss.

🤖 For AI Coding Agents

Quick Commands for Agents:

# 1. Verify LLM connectivity (MUST run first)
python scripts/llm_ping.py

# 2. Validate workflow configuration (dry-run)
python scripts/translate_llm.py input.csv output.csv workflow/style_guide.md glossary/compiled.yaml --dry-run

# 3. Run E2E test
python scripts/test_e2e_workflow.py

Environment Variables (REQUIRED):

LLM_BASE_URL=https://api.example.com/v1
LLM_API_KEY=sk-your-key
LLM_MODEL=gpt-4.1-mini
LLM_TRACE_PATH=data/llm_trace.jsonl

Key Rules for Agents:

Never hardcode API keys - Use environment variables only
Run llm_ping.py first - Fail-fast if LLM unavailable
Check WORKSPACE_RULES.md - See docs/WORKSPACE_RULES.md for hard constraints
Row preservation is P0 - Empty source rows must be preserved with status=skipped_empty
Glossary is mandatory - glossary/compiled.yaml must exist before translation

🚀 Pipeline Overview

Input CSV → Normalize → Translate → QA_Hard → Repair → Export
                ↓
            Glossary (required)

Step	Script	Purpose	Blocking?
0	`llm_ping.py`	🔌 LLM connectivity check	YES
1	`normalize_guard.py`	🧊 Freeze placeholders → tokens	YES
2-4	`extract_terms.py` → `glossary_compile.py`	📖 Build glossary	YES
5	`translate_llm.py`	🤖 AI Translation	YES
6	`qa_hard.py`	🛡️ Validate tokens/patterns	YES
7	`repair_loop.py`	🔧 Auto-repair hard errors	-
8	`soft_qa_llm.py`	🧠 Quality review	-
10	`rehydrate_export.py`	💧 Restore tokens → placeholders	YES

📁 Project Structure

loc-mvr/
├── config/
│   ├── llm_routing.yaml    # Model routing per step
│   └── pricing.yaml        # Cost calculation
├── glossary/
│   ├── compiled.yaml       # Active glossary (generated)
│   └── generic_terms_zh.txt # Blacklist for extraction
├── scripts/
│   ├── llm_ping.py         # ★ Run first - connectivity check
│   ├── normalize_guard.py  # Step 1: Placeholder freezing
│   ├── translate_llm.py    # Step 5: Translation
│   ├── qa_hard.py          # Step 6: Hard validation
│   ├── repair_loop.py      # Step 7: Auto-repair
│   └── runtime_adapter.py  # LLM client with routing
├── workflow/
│   ├── style_guide.md      # Translation style rules
│   ├── forbidden_patterns.txt
│   └── placeholder_schema.yaml
└── docs/
    └── WORKSPACE_RULES.md  # ★ Hard constraints for agents

🔧 Quick Start (Human)

1. Setup

git clone https://github.com/Charpup/game-localization-mvr.git
cd game-localization-mvr
pip install pyyaml requests numpy pandas jieba

2. Configure LLM (推荐持久化)

# Windows PowerShell
$env:LLM_BASE_URL="https://api.apiyi.com/v1"
$env:LLM_API_KEY="sk-your-key"
$env:LLM_MODEL="gpt-4.1-mini"

也可在本地持久化文件中配置（优先于环境变量自动读取）：

# 在 main_worktree/.llm_credentials 创建
LLM_BASE_URL=https://api.apiyi.com/v1
LLM_API_KEY=sk-your-key

当前加载顺序：LLM_API_KEY_FILE -> ./.llm_credentials/./.llm_env/./config/llm_credentials.env/~/.game-localization-mvr/.llm_credentials -> LLM_API_KEY

4. Dependency + Environment Quick Check (before every smoke run)

python - <<'PY'
import os
import importlib

for pkg in ["requests", "numpy", "yaml", "pandas"]:
    try:
        importlib.import_module(pkg)
        print(f"[OK] {pkg}")
    except Exception:
        print(f"[MISSING] {pkg}")

for key in ["LLM_BASE_URL", "LLM_API_KEY", "LLM_MODEL"]:
    print(f"{key}={'SET' if os.getenv(key) else 'MISSING'}")
PY

If any dependency shows MISSING or env variable shows MISSING, do not start smoke run yet.

PowerShell 快速检查：

$missing = @()
foreach ($m in @("requests","numpy","yaml","pandas","jieba")) {
  try {
    python -c "import importlib.util; print(bool(importlib.util.find_spec('$m')))"
    Write-Host "[OK] $m"
  } catch {
    $missing += $m
    Write-Host "[MISSING] $m"
  }
}
Write-Host "LLM_BASE_URL=$([bool]$env:LLM_BASE_URL)"
Write-Host "LLM_API_KEY=$([bool]$env:LLM_API_KEY)"
Write-Host "LLM_MODEL=$([bool]$env:LLM_MODEL)"

3. Run Pipeline

# Verify LLM
python scripts/llm_ping.py

# Normalize → Translate → QA → Export
python scripts/normalize_guard.py input.csv normalized.csv map.json workflow/placeholder_schema.yaml
python scripts/translate_llm.py normalized.csv translated.csv workflow/style_guide.md glossary/compiled.yaml
python scripts/qa_hard.py translated.csv qa_report.json map.json
python scripts/rehydrate_export.py translated.csv map.json final.csv

3.1 Smoke Pipeline (Manifest + Issue Record)

# Full smoke pass with manifest output + issue recording
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr 测试文档\\test_input_200-row.csv" --target-lang en-US
# 可选：仅做预检
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr 测试文档\\test_input_200-row.csv" --target-lang en-US --verify-mode preflight

This command:

runs llm_ping -> normalize_guard -> translate_llm -> qa_hard -> rehydrate_export
generates a run manifest: data/smoke_run_<timestamp>/run_manifest.json
runs smoke_verify --manifest ...
records issues to reports/smoke_issues_<run-id>.json and .jsonl
emits manifest.stage_artifacts with:
- connectivity_log
- normalize_log
- translate_log
- qa_hard_report
- final_csv
- smoke_verify_log
verify_mode supports preflight|full，默认 full（含行数/QA 统计）

建议每次冒烟固定检查以下产物：

D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\data\smoke_runs\<run>\run_manifest.json
D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_issues_<run_id>.json
D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_verify_<run_id>.json

⚡ Key Features

Row Preservation: Empty rows kept with status=skipped_empty
Drift Guard: Refresh stage blocks non-placeholder text changes
Progress Reporting: --progress_every N for translation progress
Router-based Models: Configure per-step models in llm_routing.yaml
LLM Tracing: All calls logged to LLM_TRACE_PATH for billing

📋 Testing

# Unit tests
python scripts/test_normalize.py
python scripts/test_qa_hard.py
python scripts/test_rehydrate.py

# E2E test (small dataset)
python scripts/test_e2e_workflow.py

# Dry-run validation
python scripts/translate_llm.py input.csv out.csv style.md glossary.yaml --dry-run

📄 License

MIT License. Built for game localization automation.

🔗 Links

Workspace Rules: docs/WORKSPACE_RULES.md
Demo Walkthrough: docs/demo.md

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.agent		.agent
.triadev		.triadev
_obsolete		_obsolete
artifacts		artifacts
config		config
docs		docs
examples		examples
glossary		glossary
handoff/m4_session_transfer		handoff/m4_session_transfer
metrics		metrics
reports		reports
reports_test		reports_test
scripts		scripts
templates/style_guides		templates/style_guides
test_output		test_output
tests		tests
workflow		workflow
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CREDENTIALS_SETUP.md		CREDENTIALS_SETUP.md
Dockerfile		Dockerfile
Dockerfile.gate		Dockerfile.gate
README.md		README.md
SPEC-delta.yaml		SPEC-delta.yaml
SPEC.yaml		SPEC.yaml
docker-compose.yml		docker-compose.yml
example_usage.py		example_usage.py
findings.md		findings.md
progress.md		progress.md
requirements.txt		requirements.txt
run.sh		run.sh
task_plan.md		task_plan.md
test_30_repaired.csv		test_30_repaired.csv
test_30_tasks.jsonl		test_30_tasks.jsonl
value-review.md		value-review.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Game Localization MVR (Minimum Viable Rules) v2.1

🤖 For AI Coding Agents

🚀 Pipeline Overview

📁 Project Structure

🔧 Quick Start (Human)

1. Setup

2. Configure LLM (推荐持久化)

4. Dependency + Environment Quick Check (before every smoke run)

3. Run Pipeline

3.1 Smoke Pipeline (Manifest + Issue Record)

⚡ Key Features

📋 Testing

📄 License

🔗 Links

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Game Localization MVR (Minimum Viable Rules) v2.1

🤖 For AI Coding Agents

🚀 Pipeline Overview

📁 Project Structure

🔧 Quick Start (Human)

1. Setup

2. Configure LLM (推荐持久化)

4. Dependency + Environment Quick Check (before every smoke run)

3. Run Pipeline

3.1 Smoke Pipeline (Manifest + Issue Record)

⚡ Key Features

📋 Testing

📄 License

🔗 Links

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages