Skip to content

Design flaw: Manim scene authoring is a manual gap in the pipeline #1

@jmjava

Description

@jmjava

Problem

docgen automates the full demo video pipeline — except for the visual animations. The current flow is:

narration/*.md → docgen tts → docgen timestamps → ??? → docgen manim → docgen compose

The ??? is a fully manual step: a developer (or AI agent) must write Python Manim scene classes in animations/scenes.py, carefully reading timing.json to place animations at the right timestamps. This is:

  1. Undocumented — nothing in docgen init, the README, or the CLI output mentions that scene authoring is manual
  2. The primary cause of blank/frozen video — when narration is regenerated (longer audio), scenes aren't automatically updated, leaving compose to pad with freeze frames
  3. The most time-consuming step — a 3-minute segment can take 30+ minutes of scene coding
  4. Fragile — timing changes in TTS silently desync existing scenes

Desired behavior

docgen should be able to generate a reasonable default scene from the narration and timing data alone, without requiring hand-written Manim code. Manual scenes should be an opt-in upgrade, not a requirement.

Proposed approach

Option A: Auto-scene generation from narration structure

Parse the narration markdown to extract:

  • Section headings → title cards
  • Bullet points / numbered lists → sequential text reveals
  • Key terms (bold, code spans) → highlighted callouts
  • Paragraph breaks → visual transitions

Combine with timing.json to place each visual element at the correct timestamp. Output a generated scenes.py (or render directly) using a template-based Manim scene.

Option B: Slide-deck style DSL in docgen.yaml

Let users define visual beats declaratively:

visual_beats:
  "18":
    - at: 0.0
      type: title
      text: "What's Coming Next"
      subtitle: "Milestone 13"
    - at: 12.0
      type: bullets
      title: "Retry on Transient Failures"
      items:
        - "Spot node eviction mid-build"
        - "Registry push timeout"
    - at: 55.0
      type: bullets
      title: "Precise Build Sizing"
      items: [...]

docgen manim would read these beats and generate + render a Manim scene automatically.

Option C: Hybrid — auto-generate with manual override

  1. docgen scene-gen produces a default scenes.py from narration + timing
  2. Users can edit the generated file for custom visuals
  3. docgen compose warns if the scene duration doesn't cover the audio duration (already exists as freeze guard)
  4. On re-running docgen tts + docgen timestamps, docgen scene-gen --update adjusts timing in existing scenes without overwriting custom animations

Impact

Without this, every narration change requires manual scene re-authoring, which:

  • Blocks CI/CD automation of video generation
  • Makes the "one command to regenerate everything" promise incomplete
  • Causes the freeze-frame / blank-screen issues users keep hitting

Acceptance criteria

  • docgen init scaffolds projects that produce watchable videos without hand-written Manim
  • Regenerating TTS (longer/shorter audio) automatically adjusts visuals
  • Manual Manim scenes remain supported as an upgrade path
  • Documentation clearly states the visual authoring workflow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions