docs: add milestone roadmap (M1 hardening, M2 slides, M3 multi-lang)

jmjava · jmjava · commit 94ff0bae053a · 2026-03-26T18:04:42.000-04:00
Three milestones covering the next phases of docgen development:
- M1: CI hardening, tesseract/ffmpeg, tekton-dag merge
- M2: reveal.js slide deck generation as a visual source type
- M3: Multi-language narration with Chinese, Spanish, Japanese support

Made-with: Cursor
diff --git a/milestones/milestone-1.md b/milestones/milestone-1.md
@@ -0,0 +1,11 @@
+# Milestone 1 — Production Hardening
+
+**Goal:** Make docgen reliable enough to merge into `tekton-dag` and use in real CI.
+
+## Items
+
+- [ ] **Install tesseract in CI** — add `apt-get install tesseract-ocr` to `ci.yml` so OCR validation runs on every push (catches "command not found", `.venv` stacking, etc. in terminal recordings)
+- [ ] **Install ffmpeg in CI** — add `apt-get install ffmpeg` so compose-guard and integration tests run instead of being skipped
+- [ ] **Tighten Manim animation pacing** — add more animation beats to `DocgenOverviewScene` and `WizardGUIScene` so static holds drop from ~57% to <30% of the video
+- [ ] **Merge `milestone/doc-generator` into `tekton-dag` main** — integrate `docgen.yaml`, wrapper scripts, and the 14-segment demo pipeline into the parent repo
+- [ ] **End-to-end smoke test** — add a CI job that runs `docgen generate-all --dry-run` to verify the full pipeline config is valid without calling OpenAI or rendering video
diff --git a/milestones/milestone-2.md b/milestones/milestone-2.md
@@ -0,0 +1,12 @@
+# Milestone 2 — Slide Deck Generation
+
+**Goal:** Add reveal.js slide deck support as a visual source type alongside Manim and VHS.
+
+## Items
+
+- [ ] **`docgen slides` CLI command** — generate a reveal.js slide deck from a simple YAML/Markdown spec per segment
+- [ ] **Slide visual type in `visual_map`** — `type: slides` alongside `manim` and `vhs`, with `source: slides/01-overview/index.html`
+- [ ] **Auto-screenshot or headless render** — capture slide transitions as MP4 using Playwright or a headless browser, timed to narration via `timing.json`
+- [ ] **Slide templates** — ship 2–3 built-in themes (dark tech, light minimal, branded) selectable in `docgen.yaml`
+- [ ] **Hot-reload preview** — `docgen slides --preview` opens a local server with live reload for editing slides
+- [ ] **Dogfood** — replace one or more Manim segments in docgen's own demos with a slide deck to validate the workflow
diff --git a/milestones/milestone-3.md b/milestones/milestone-3.md
@@ -0,0 +1,25 @@
+# Milestone 3 — Multi-Language Narration
+
+**Goal:** Generate demo videos in multiple languages from a single English narration source.
+
+## Target Languages (initial)
+
+- Chinese (Mandarin, `zh`)
+- Spanish (`es`)
+- Japanese (`ja`)
+- Additional languages easy to add via config
+
+## Items
+
+- [ ] **Translation stage** — `docgen translate --lang zh` calls an LLM (GPT-4o) to translate narration Markdown files, preserving technical terms and TTS-friendly phrasing
+- [ ] **Per-language TTS voices** — configure voice per language in `docgen.yaml` under `tts.voices.zh`, `tts.voices.es`, etc. OpenAI TTS supports Chinese, Spanish, Japanese natively
+- [ ] **Language-aware pipeline** — `docgen generate-all --lang zh` runs TTS → timestamps → compose for the target language, reusing the same visual assets (Manim/VHS/slides are language-neutral)
+- [ ] **Output structure** — recordings land in `recordings/zh/`, `recordings/es/` etc., with per-language concat and Pages index
+- [ ] **GitHub Pages language switcher** — add a language dropdown to `index.html` that swaps video sources
+- [ ] **Cost estimation** — `docgen translate --dry-run` shows estimated token count and cost before calling the API (GPT-4o translation is ~$2.50/1M input tokens — a 6-segment demo with ~3K words is well under $0.01 per language)
+- [ ] **Translation review workflow** — `docgen wizard` supports editing translated narration with side-by-side English reference
+- [ ] **Validation** — `docgen validate` checks translated narrations for length parity with English (±20%) to catch truncated or bloated translations
+
+## Cost Notes
+
+OpenAI TTS pricing is the same regardless of language. The main additional cost is the translation step via GPT-4o, which is negligible for typical demo narration (~2–5K words). Chinese (Mandarin) is well-supported by both GPT-4o for translation and OpenAI TTS for speech synthesis.