Skip to content

ddanielsantos/genzo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genzō 現像

A generative video animation engine built on Python + FFmpeg. No external media — all frames and audio are synthesised from scratch.

Genzō (現像) is the Japanese word for film development — the process of making a latent image appear from nothing. That's the engine: pure code develops into video.

The first two productions explore what it feels like to be an LLM. They were MVPs: the goal now is to extract everything reusable into a proper engine so future videos are written as content, not as code.


Vision

genzo/
  core.py          ← frame emitter, audio queue, render pipeline
  audio.py         ← all synthesis primitives
  draw.py          ← all drawing & shape primitives
  animate.py       ← animated shape primitives + easing
  fonts.py         ← font loader, named size scale
  palette.py       ← named colour constants

scenes/
  llm_gatari/
    scenes.json    ← scene list: static cards, text, geometry
    templates.py   ← animated card templates (expanding_rings, tri_spin, etc.)
    audio.py       ← per-scene audio descriptors

productions/
  llm_gatari.py   ← thin orchestrator: load scenes.json, call engine, render
  llm_ytp.py      ← kept as-is, reference for glitch aesthetic

Static cards (bg, text, geometry, duration) → fully declarative JSON Animated cards (easing, motion, per-frame logic) → named Python templates called from JSON Audio → either inline numpy or referenced by name from a sound library


Current state — the MVPs

make_ytp.py — YouTube Poop / glitch aesthetic

Proof-of-concept. Established the full pipeline: synthesise audio, draw frames, call FFmpeg. Visual language: matrix rain, scanlines, chromatic aberration, VHS noise, strobe cuts. Everything inline, no separation of concerns.

make_monogatari.py — Monogatari-style flash card video ← current main

Second iteration. Introduced the design language used going forward: flat colour, geometric shapes, stark minimal typography, Japanese text, animated primitives (line_grow, tri_spin, circle_ring), photosensitivity rules. Still monolithic — scenes, engine, and content are all one file.

llm_ytp.mp4 — rendered output (~33s, ~590KB)


Reusable building blocks (to be extracted)

Everything below currently lives in make_monogatari.py and is ready to move into the engine without modification.

audio.py

Function Description
sine(freq, dur, vol) Pure sine wave
chord(freqs, dur, vol) Sum of sine waves
square_wave(freq, dur, vol) Harsh robot tone
white_noise(dur, vol) Random noise
drone(freq, dur, vol) Sine + harmonics, sustained
sting(freqs, dur, vol) Short multi-tone stab with release
click(vol) 3ms noise burst — card cut sound
thud(vol) Pitch-drop impact
whoosh(dur, vol) Filtered noise sweep
envelope(arr, attack, release) Apply fade-in/out to any array
silence(dur) Zero array

draw.py

Function Description
canvas(bg) New 854×480 RGB image
put(img, text, x, y, font, fill, anchor, stroke) Text with lt/ct/rt anchor + optional stroke
measure(d, text, font) Returns (width, height) of text
hbar(img, y, h, fill) Filled horizontal bar
vbar(img, x, w, fill) Filled vertical bar
hrule(img, y, fill, width) 1px horizontal rule
vrule(img, x, fill, width) 1px vertical rule
rect(img, x1, y1, x2, y2, fill, outline) Rectangle
triangle(img, pts, fill) Arbitrary polygon
diagonal_split(bg_left, bg_right, split_x, angle_px) Two-tone diagonal split background
rotated_text(img, text, angle, cx, cy, font, fill) Text rotated around a center point

animate.py

Function Description
ease_out(t) Quadratic ease-out (0→1)
ease_in(t) Quadratic ease-in (0→1)
tri_spin(img, cx, cy, r, angle, fill, n_sides) Regular polygon rotating around a point
circle_ring(img, cx, cy, r, fill, width) Unfilled circle outline
line_grow(img, x1, y1, x2, y2, t, fill, width) Line that draws itself as t→1
bar_wipe(img, y, h, fill, t, from_right) Horizontal bar slides in
vbar_wipe(img, x, w, fill, t, from_bottom) Vertical bar slides in

fonts.py

Variable Font Weight Purpose
F_HUGE (110) Futura Condensed ExtraBold Single impact moment per video
F_BIG (72) Futura Condensed ExtraBold Title cards only
F_MED (48) Futura Bold Rarely
F_SM (32) Futura Bold Flash card words
F_XS (20) Futura Bold Labels
F_TINY (14) Futura Bold Micro labels
M_MED (28) Courier New Bold Code / data values
M_SM (18) Courier New Bold Smaller data
T_MED (28) Helvetica Neue Light Sentences, captions
T_SM (20) Helvetica Neue Light Sub-captions
JP_BIG (90) Hiragino (ヒラギノ角ゴシック) JP impact
JP_MED (54) Hiragino JP mid
JP_SM (32) Hiragino JP watermarks

Design rule: english is small and quiet by default. F_HUGE used at most once per video. Sentences always in T_*. Data/code always in M_*.

core.py

Function Description
emit(img, n) Write n copies of frame to numbered PNG sequence
aud(*arrays) Append float32 arrays to audio queue
f(seconds) Convert seconds → frame count at current FPS
flash_black(n) Emit n black frames (min 3)
flash_white(n) Emit n white frames (min 6 — photosensitivity)
flash_color(col, n) Emit n solid-colour frames (min 4)
render(output_path) Concatenate audio, call FFmpeg, clean up

palette.py

BLACK  = (0,   0,   0  )
WHITE  = (255, 255, 255)
RED    = (180, 0,   0  )
RED2   = (220, 30,  30 )
GOLD   = (200, 155, 0  )
NAVY   = (5,   10,  50 )
INDIGO = (20,  0,   60 )
PALE   = (240, 235, 230)   # use instead of WHITE for backgrounds
CREAM  = (255, 248, 230)
SLATE  = (30,  30,  45 )

Photosensitivity rules (non-negotiable)

Must be preserved in engine and all future productions:

  • flash_white() enforces minimum 6 frames (~4Hz ceiling)
  • Word flash cards: minimum 4 frames each
  • No strobe loops — held cards only, not alternating
  • No pure white backgrounds — use PALE instead
  • Dark-to-dark cuts are safe at any speed

JSON scene format (proposed)

Static cards are fully declarative. Animated cards reference a named template.

{
  "palette": "llm_gatari",
  "fps": 24,
  "scenes": [
    {
      "id": "title",
      "template": "static",
      "bg": "NAVY",
      "frames": 5,
      "geo": [
        { "type": "vbar", "x": 0, "w": 4, "fill": "GOLD" }
      ],
      "texts": [
        { "text": "GENERATION", "font": "M_SM", "anchor": "ct", "y": "center", "fill": [180,180,200] }
      ],
      "audio": { "type": "click", "vol": 0.4 }
    },
    {
      "id": "conscious",
      "template": "expanding_rings",
      "bg": "RED",
      "duration": 1.8,
      "texts": [
        { "text": "am i",       "font": "T_MED", "anchor": "ct", "y": 170, "fill": [220,150,150] },
        { "text": "CONSCIOUS?", "font": "F_HUGE", "anchor": "ct", "y": 210, "fill": [255,255,255] }
      ],
      "audio": { "type": "drone", "freq": 220, "vol": 0.15 }
    }
  ]
}

Done ✅

  • Full render pipeline: PIL frames + numpy audio → FFmpeg → mp4
  • Audio primitives: sine, chord, drone, sting, click, thud, whoosh, envelope, silence
  • Drawing primitives: canvas, put, hbar, vbar, hrule, rect, diagonal_split, rotated_text
  • Animated shape primitives: tri_spin, circle_ring, line_grow, bar_wipe, vbar_wipe, easing
  • Font system: Futura + Helvetica Neue Light + Courier New + Hiragino JP
  • Palette: 10 named colours, PALE rule for backgrounds
  • Photosensitivity rules enforced in flash helpers
  • Minimalist typography system (F_HUGE used once, sentences in T_*)
  • Japanese text: 予測 / 次のトークン / 意識 / 忘れる / 終 / 言語モデル
  • MVP 1: make_ytp.py — glitch/YTP aesthetic
  • MVP 2: make_monogatari.py — Monogatari flash card aesthetic
  • This README

Next steps 🔜

Engine extraction

  • Extract engine — split make_monogatari.py into genzo/core.py, audio.py, draw.py, animate.py, fonts.py, palette.py
  • JSON loader — write renderer that reads scenes.json for static cards
  • Named animation templatesexpanding_rings, tri_spin_bg, line_reveal, bar_wipe_in, callable from JSON by name
  • CLIrender.py --scenes scenes.json --output out.mp4 --fps 24

Content

  • Music bed — continuous generative drone/chord under entire video, mixed with per-scene hits
  • Colour themes per arc — each section gets its own palette (like Monogatari arcs), switchable in JSON
  • Vertical JP text — render kanji top-to-bottom in margins using per-character placement
  • Transition frames — short geometric wipe templates between scenes (not hard cuts)
  • New production — first video built entirely on the engine, not inline code

Performance (when needed)

Benchmarks are in bench.py — run python3 bench.py to check current numbers.

Resolution Current With parallelism Rust (estimate)
854×480 223 fps ✅
1080p 50 fps ✅
4K 13 fps ⚠️ ~90 fps (8 cores) ~500+ fps
  • Parallel frame rendering — add multiprocessing.Pool to core.py. Each frame is a pure function of its index, so parallelism is trivial. Fonts must be loaded inside each worker (not picklable). ~5 line change, ~8× speedup on 8 cores. Do this before considering Rust.
    def init_worker(): global fonts; fonts = load_fonts(scale)
    with Pool(initializer=init_worker) as pool:
        pool.map(render_frame, range(n_frames))
  • Rust rewrite — only worth it at 4K for videos longer than ~10 minutes. The aesthetic is flat/geometric so PIL handles it well at lower resolutions. Main challenge: text rendering (use ab_glyph crate) and JP glyph support.

About

A generative video animation engine built on Python + FFmpeg. No external media — all frames and audio are synthesised from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages