-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
When Manim scenes are authored (manually or eventually auto-generated), several visual quality issues can slip through undetected:
-
Text overlap — Multiple text elements positioned on top of each other, making content unreadable. This is especially common when scenes are extended to fill longer audio durations.
-
Readability failures — Font sizes too small, insufficient contrast between text and background, elements running off-screen, or dense content without adequate spacing.
-
Layout collisions — Animated elements that fade in over existing elements without first clearing the previous content.
Current state
docgen validate currently checks:
- Trailing freeze ratio (visual ending before audio)
- Blank frame detection
- Audio/video drift
- OCR-based error pattern matching (for terminal recordings)
- Git LFS pointer detection
None of these catch visual quality issues in Manim-rendered segments.
Proposed solution
Phase 1: OCR-based overlap detection
- Sample frames at regular intervals (e.g., every 5 seconds) throughout the video
- Run OCR on each frame and check for garbled/overlapping text regions
- Flag frames where OCR confidence is below a threshold (indicating overlapping text)
- Detect bounding box collisions between recognized text regions
Phase 2: Layout constraints
- Define a "safe zone" margin (e.g., 5% from each edge) — flag content outside it
- Minimum font size threshold for readability at 720p
- Maximum number of text elements visible simultaneously
- Minimum vertical spacing between text items
Phase 3: Auto-generated scene layout engine
- When
docgeneventually generates Manim scenes automatically (see Design flaw: Manim scene authoring is a manual gap in the pipeline #1), the layout engine should enforce a grid/zone system that prevents overlap by construction - Section-based layout: topic strip at top, detail items below with fixed vertical spacing
- Auto-clear previous section content before rendering next section
Real-world examples
During the tekton-dag 18-segment regeneration, extending scenes to match 150-210s audio durations led to:
- "Retry on Transient Failures" text overlapping with other pillar descriptions in the RoadmapScene
- Dense bullet points becoming unreadable when 7+ items displayed simultaneously
- Elements not being removed before new sections animated in
These were caught only by manual review — docgen validate reported all segments as passing.
Labels
Enhancement, validation