Skip to content

Manim generation lessons: font, layout, rendering, and sync #3

@jmjava

Description

@jmjava

Context

These are field-tested lessons from regenerating 18 demo segments for tekton-dag using docgen. Every issue here was hit in production and required manual intervention. When docgen absorbs auto-scene generation (#1) and visual validation (#2), these lessons should inform the defaults and guardrails.

Lesson 1: Manim's default font is unusable

Problem: Manim uses Pango's default font which renders text with terrible kerning — characters appear individually placed, like ransom-note typography. At 720p the text is blurry and nearly unreadable.

Fix applied: Text.set_default(font="Lato") at the start of construct(). Any clean sans-serif (Lato, Inter, Roboto, Liberation Sans) is dramatically better.

Recommendation for docgen:

  • Set a default font in the Manim config/template — never ship Pango defaults
  • Add manim.font to docgen.yaml so users can override (default: "Lato" or "Liberation Sans" for maximum availability)
  • Document font requirements in docgen init scaffold output
  • CI smoke test should verify the configured font is installed on the system

Lesson 2: Hardcoded coordinates cause overlap — use Manim's layout system

Problem: Positioning text with absolute coordinates (move_to(UP * 1.3 + LEFT * 2.5)) is fragile. Every font, size, and content change shifts bounding boxes, causing text-on-text collisions. We went through 3 full rewrites of a single scene (RoadmapScene) trying to fix overlaps with coordinate adjustments.

Fix applied: Replaced all absolute positioning with Manim's layout primitives:

  • VGroup.arrange(DOWN, buff=0.25, aligned_edge=LEFT) for vertical lists
  • mob.next_to(anchor, DOWN, buff=0.3) to chain elements below each other
  • Never compute Y positions manually — let Manim measure actual text bounds

Recommendation for docgen:

  • Auto-generated scenes must ONLY use arrange() and next_to() for layout — ban absolute coordinates except for the top-level anchor points (strip Y, content area Y)
  • Define 2–3 layout zones (nav strip, title, content area) with fixed Y anchors; everything inside a zone uses relative positioning
  • The layout engine should enforce a maximum content height per section and warn if content would overflow below the visible frame

Lesson 3: Font sizes must be minimum 14pt for video

Problem: Font sizes of 8–11pt are unreadable in video, even at 1080p. This is not print — viewers watch on screens at normal viewing distance, often in browser video players with compression artifacts.

Fix applied: Minimum body text 16pt, titles 20pt, section headings 36pt, pillar card labels 14pt.

Recommendation for docgen:

  • Enforce minimum font sizes in the layout engine: body ≥ 14pt, subtitle ≥ 16pt, heading ≥ 20pt
  • docgen validate should sample frames and flag text regions where OCR confidence is low (proxy for too-small or blurry text)
  • docgen.yaml should allow manim.min_font_size override

Lesson 4: Render at 1080p minimum — 720p is too blurry for text-heavy content

Problem: 720p30 renders look blurry when the video contains dense text (bullet lists, code snippets, multi-line labels). Compression artifacts at 720p make small text illegible.

Fix applied: Render at 1920×1080 (or 2560×1440 for production quality). The compose step handles resolution normalization.

Recommendation for docgen:

  • Default manim.quality in docgen.yaml should be 1080p30 not 720p30
  • docgen manim should render at the configured quality and warn if below 1080p
  • Document that 720p is only suitable for terminal recordings (VHS), not Manim text scenes

Lesson 5: _wait_until is essential but easy to miscalculate

Problem: Manim animations must fill the exact audio duration. _wait_until(self, target_t, current_t) is the mechanism, but timing errors accumulate. If any section runs over its allocated window, subsequent _wait_until calls become no-ops and the scene desynchronizes.

Fix applied: Conservative timing — each pillar section ends ~1s before the next starts, with an explicit fade-out transition to absorb timing drift.

Recommendation for docgen:

  • Auto-generated scenes should budget animation time per section from Whisper segments, with 1–2s buffer between sections
  • Add a _wait_until wrapper that logs a warning (not crash) if target_t < current_t — this catches timing overflows during development
  • After rendering, compare actual scene duration to audio duration and warn if drift exceeds 2%

Lesson 6: Pillar/section pattern should be a reusable template

Problem: Every pillar in segment 18 follows the same visual pattern: highlight card → show title → reveal bullet list → clear. We wrote this 7 times with slight variations, which is error-prone.

Fix applied: Extracted _show_pillar(), _add_subtitle(), _reveal_list(), _clear() helper methods.

Recommendation for docgen:

  • Provide a SectionScene base class or mixin with built-in support for: nav strip, title card, bullet reveals, key-value pairs, flow diagrams, and section transitions
  • Auto-scene generation should detect section boundaries from narration paragraphs and apply this pattern automatically
  • Users can customize by overriding specific sections rather than writing the full construct() method

Lesson 7: The overview "strip" of cards must scale dynamically

Problem: We started with 5 pillar cards at width=1.8. When expanding to 7 cards, they overflowed the frame width. Manual resizing to width=1.4 was needed.

Fix applied: Used VGroup.arrange(RIGHT, buff=0.15) to auto-space cards, then positioned the group as a unit.

Recommendation for docgen:

  • Nav strip should auto-calculate card width based on (frame_width - margins) / num_cards
  • If labels are too long, truncate or reduce font size automatically
  • Max 8–10 cards before switching to a two-row layout

Lesson 8: Contrast rules for dark backgrounds

Problem: Colored text on dark backgrounds (e.g., color=C_WARN on C_BG) has insufficient contrast when the element is dimmed to 0.2 opacity. Inactive elements become invisible.

Fix applied: White text with colored accents (icon, background fill at 0.25 opacity). Inactive opacity floor of 0.4 (not 0.2).

Recommendation for docgen:

  • Default text color should always be WHITE on dark backgrounds; use color only for accents (icons, borders, fills)
  • Minimum opacity for dimmed elements: 0.35–0.40
  • docgen validate should check frame-level contrast ratios (WCAG AA: 4.5:1 for text)

Lesson 9: docgen compose path conventions must match render output

Problem: docgen compose looks for Manim output at animations/media/videos/scenes/720p30/<Scene>.mp4, but programmatic renders (via Python scripts) output to animations/media/videos/720p30/<Scene>.mp4 (no scenes/ subdirectory). This caused "FREEZE GUARD" failures that were actually just file-not-found falling through to stale cached files.

Fix applied: Manual cp to the expected path after each render.

Recommendation for docgen:

  • docgen manim should handle the render and place the output in the canonical path — users should never need to run Manim directly
  • If a stale file is found at the expected path, compare its duration to the audio and warn if they differ by more than 10%
  • Support multiple resolution directories: look for 1080p30/, 1440p60/, 720p30/ in priority order

Lesson 10: TTS regeneration invalidates everything downstream

Problem: After updating narration and regenerating TTS, the new audio has a different duration. This silently breaks all existing Manim timing, but nothing warns you. The compose step either pads with freeze frames (if shorter) or clips (if longer).

Fix applied: Full pipeline re-run: tts → timestamps → rewrite scene → render → compose → validate.

Recommendation for docgen:

  • docgen tts should emit a duration-change summary: "18-roadmap: 205.0s → 314.6s (+53%)"
  • If duration changed by more than 5%, print a WARNING that scenes and timestamps need regeneration
  • docgen compose should refuse to compose if the scene MP4 was last modified before the audio MP4 (stale visual)
  • Add a docgen rebuild <segment> command that runs the full pipeline: tts → timestamps → manim → compose → validate

Summary

The core theme: docgen should make the easy path the correct path. Good fonts, relative layout, adequate font sizes, proper resolution, and duration-aware validation should all be defaults — not things a user discovers after 5 hours of debugging blurry overlapping text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions