Skip to content

Visual quality validation: overlap detection and readability checks #2

@jmjava

Description

@jmjava

Problem

When Manim scenes are authored (manually or eventually auto-generated), several visual quality issues can slip through undetected:

  1. Text overlap — Multiple text elements positioned on top of each other, making content unreadable. This is especially common when scenes are extended to fill longer audio durations.

  2. Readability failures — Font sizes too small, insufficient contrast between text and background, elements running off-screen, or dense content without adequate spacing.

  3. Layout collisions — Animated elements that fade in over existing elements without first clearing the previous content.

Current state

docgen validate currently checks:

  • Trailing freeze ratio (visual ending before audio)
  • Blank frame detection
  • Audio/video drift
  • OCR-based error pattern matching (for terminal recordings)
  • Git LFS pointer detection

None of these catch visual quality issues in Manim-rendered segments.

Proposed solution

Phase 1: OCR-based overlap detection

  • Sample frames at regular intervals (e.g., every 5 seconds) throughout the video
  • Run OCR on each frame and check for garbled/overlapping text regions
  • Flag frames where OCR confidence is below a threshold (indicating overlapping text)
  • Detect bounding box collisions between recognized text regions

Phase 2: Layout constraints

  • Define a "safe zone" margin (e.g., 5% from each edge) — flag content outside it
  • Minimum font size threshold for readability at 720p
  • Maximum number of text elements visible simultaneously
  • Minimum vertical spacing between text items

Phase 3: Auto-generated scene layout engine

  • When docgen eventually generates Manim scenes automatically (see Design flaw: Manim scene authoring is a manual gap in the pipeline #1), the layout engine should enforce a grid/zone system that prevents overlap by construction
  • Section-based layout: topic strip at top, detail items below with fixed vertical spacing
  • Auto-clear previous section content before rendering next section

Real-world examples

During the tekton-dag 18-segment regeneration, extending scenes to match 150-210s audio durations led to:

  • "Retry on Transient Failures" text overlapping with other pillar descriptions in the RoadmapScene
  • Dense bullet points becoming unreadable when 7+ items displayed simultaneously
  • Elements not being removed before new sections animated in

These were caught only by manual review — docgen validate reported all segments as passing.

Labels

Enhancement, validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions