-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
docgen automates the full demo video pipeline — except for the visual animations. The current flow is:
narration/*.md → docgen tts → docgen timestamps → ??? → docgen manim → docgen compose
The ??? is a fully manual step: a developer (or AI agent) must write Python Manim scene classes in animations/scenes.py, carefully reading timing.json to place animations at the right timestamps. This is:
- Undocumented — nothing in
docgen init, the README, or the CLI output mentions that scene authoring is manual - The primary cause of blank/frozen video — when narration is regenerated (longer audio), scenes aren't automatically updated, leaving compose to pad with freeze frames
- The most time-consuming step — a 3-minute segment can take 30+ minutes of scene coding
- Fragile — timing changes in TTS silently desync existing scenes
Desired behavior
docgen should be able to generate a reasonable default scene from the narration and timing data alone, without requiring hand-written Manim code. Manual scenes should be an opt-in upgrade, not a requirement.
Proposed approach
Option A: Auto-scene generation from narration structure
Parse the narration markdown to extract:
- Section headings → title cards
- Bullet points / numbered lists → sequential text reveals
- Key terms (bold, code spans) → highlighted callouts
- Paragraph breaks → visual transitions
Combine with timing.json to place each visual element at the correct timestamp. Output a generated scenes.py (or render directly) using a template-based Manim scene.
Option B: Slide-deck style DSL in docgen.yaml
Let users define visual beats declaratively:
visual_beats:
"18":
- at: 0.0
type: title
text: "What's Coming Next"
subtitle: "Milestone 13"
- at: 12.0
type: bullets
title: "Retry on Transient Failures"
items:
- "Spot node eviction mid-build"
- "Registry push timeout"
- at: 55.0
type: bullets
title: "Precise Build Sizing"
items: [...]docgen manim would read these beats and generate + render a Manim scene automatically.
Option C: Hybrid — auto-generate with manual override
docgen scene-genproduces a defaultscenes.pyfrom narration + timing- Users can edit the generated file for custom visuals
docgen composewarns if the scene duration doesn't cover the audio duration (already exists as freeze guard)- On re-running
docgen tts+docgen timestamps,docgen scene-gen --updateadjusts timing in existing scenes without overwriting custom animations
Impact
Without this, every narration change requires manual scene re-authoring, which:
- Blocks CI/CD automation of video generation
- Makes the "one command to regenerate everything" promise incomplete
- Causes the freeze-frame / blank-screen issues users keep hitting
Acceptance criteria
-
docgen initscaffolds projects that produce watchable videos without hand-written Manim - Regenerating TTS (longer/shorter audio) automatically adjusts visuals
- Manual Manim scenes remain supported as an upgrade path
- Documentation clearly states the visual authoring workflow