Skip to content

Commit 94ff0ba

Browse files
committed
docs: add milestone roadmap (M1 hardening, M2 slides, M3 multi-lang)
Three milestones covering the next phases of docgen development: - M1: CI hardening, tesseract/ffmpeg, tekton-dag merge - M2: reveal.js slide deck generation as a visual source type - M3: Multi-language narration with Chinese, Spanish, Japanese support Made-with: Cursor
1 parent 545e010 commit 94ff0ba

3 files changed

Lines changed: 48 additions & 0 deletions

File tree

milestones/milestone-1.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Milestone 1 — Production Hardening
2+
3+
**Goal:** Make docgen reliable enough to merge into `tekton-dag` and use in real CI.
4+
5+
## Items
6+
7+
- [ ] **Install tesseract in CI** — add `apt-get install tesseract-ocr` to `ci.yml` so OCR validation runs on every push (catches "command not found", `.venv` stacking, etc. in terminal recordings)
8+
- [ ] **Install ffmpeg in CI** — add `apt-get install ffmpeg` so compose-guard and integration tests run instead of being skipped
9+
- [ ] **Tighten Manim animation pacing** — add more animation beats to `DocgenOverviewScene` and `WizardGUIScene` so static holds drop from ~57% to <30% of the video
10+
- [ ] **Merge `milestone/doc-generator` into `tekton-dag` main** — integrate `docgen.yaml`, wrapper scripts, and the 14-segment demo pipeline into the parent repo
11+
- [ ] **End-to-end smoke test** — add a CI job that runs `docgen generate-all --dry-run` to verify the full pipeline config is valid without calling OpenAI or rendering video

milestones/milestone-2.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Milestone 2 — Slide Deck Generation
2+
3+
**Goal:** Add reveal.js slide deck support as a visual source type alongside Manim and VHS.
4+
5+
## Items
6+
7+
- [ ] **`docgen slides` CLI command** — generate a reveal.js slide deck from a simple YAML/Markdown spec per segment
8+
- [ ] **Slide visual type in `visual_map`**`type: slides` alongside `manim` and `vhs`, with `source: slides/01-overview/index.html`
9+
- [ ] **Auto-screenshot or headless render** — capture slide transitions as MP4 using Playwright or a headless browser, timed to narration via `timing.json`
10+
- [ ] **Slide templates** — ship 2–3 built-in themes (dark tech, light minimal, branded) selectable in `docgen.yaml`
11+
- [ ] **Hot-reload preview**`docgen slides --preview` opens a local server with live reload for editing slides
12+
- [ ] **Dogfood** — replace one or more Manim segments in docgen's own demos with a slide deck to validate the workflow

milestones/milestone-3.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Milestone 3 — Multi-Language Narration
2+
3+
**Goal:** Generate demo videos in multiple languages from a single English narration source.
4+
5+
## Target Languages (initial)
6+
7+
- Chinese (Mandarin, `zh`)
8+
- Spanish (`es`)
9+
- Japanese (`ja`)
10+
- Additional languages easy to add via config
11+
12+
## Items
13+
14+
- [ ] **Translation stage**`docgen translate --lang zh` calls an LLM (GPT-4o) to translate narration Markdown files, preserving technical terms and TTS-friendly phrasing
15+
- [ ] **Per-language TTS voices** — configure voice per language in `docgen.yaml` under `tts.voices.zh`, `tts.voices.es`, etc. OpenAI TTS supports Chinese, Spanish, Japanese natively
16+
- [ ] **Language-aware pipeline**`docgen generate-all --lang zh` runs TTS → timestamps → compose for the target language, reusing the same visual assets (Manim/VHS/slides are language-neutral)
17+
- [ ] **Output structure** — recordings land in `recordings/zh/`, `recordings/es/` etc., with per-language concat and Pages index
18+
- [ ] **GitHub Pages language switcher** — add a language dropdown to `index.html` that swaps video sources
19+
- [ ] **Cost estimation**`docgen translate --dry-run` shows estimated token count and cost before calling the API (GPT-4o translation is ~$2.50/1M input tokens — a 6-segment demo with ~3K words is well under $0.01 per language)
20+
- [ ] **Translation review workflow**`docgen wizard` supports editing translated narration with side-by-side English reference
21+
- [ ] **Validation**`docgen validate` checks translated narrations for length parity with English (±20%) to catch truncated or bloated translations
22+
23+
## Cost Notes
24+
25+
OpenAI TTS pricing is the same regardless of language. The main additional cost is the translation step via GPT-4o, which is negligible for typical demo narration (~2–5K words). Chinese (Mandarin) is well-supported by both GPT-4o for translation and OpenAI TTS for speech synthesis.

0 commit comments

Comments
 (0)