feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0) by garrytan · Pull Request #551 · garrytan/gstack

garrytan · 2026-03-27T04:31:19Z

Summary

gstack can generate real UI mockups. Not ASCII art, not text descriptions. Real visual designs.

New: design binary ($D) — compiled TypeScript CLI wrapping OpenAI GPT Image API. 13 commands:

Command	What it does
`generate`	Brief → PNG mockup (~40s)
`variants`	N style variations, staggered parallel
`iterate`	Multi-turn with session state
`check`	Vision quality gate (PASS/FAIL)
`compare`	HTML comparison board (Pick, stars, feedback, Submit)
`serve`	HTTP server for comparison board feedback loop
`gallery`	Design history timeline
`extract`	Mockup → DESIGN.md (design memory)
`diff`	Visual diff between two images
`verify`	Live screenshot vs approved mockup
`evolve`	Screenshot + brief → evolved mockup
`prompt`	Mockup → structured implementation prompt
`setup`	Guided API key setup + smoke test

/design-shotgun skill. Standalone design exploration. Generates multiple AI design variants, opens a comparison board in your browser, iterates until you approve a direction. Session awareness, taste memory, screenshot-to-variants, configurable variant count.

Comparison board opens in headed Chrome. User picks favorite, rates variants, leaves feedback, clicks Submit. Feedback flows via HTTP POST, not DOM polling. Grid/Large view toggle, 2-column bottom section (submit + regenerate).

Skill integrations:

/office-hours generates visual explorations by default (skippable)
/plan-design-review generates "what 10/10 looks like" mockups
/design-consultation uses comparison board for Phase 5 review

Auth: ~/.gstack/openai.json (0600) → OPENAI_API_KEY env var → guided setup. Codex OAuth tokens lack image scopes (verified via smoke test).

Design doc

Full plan with all review decisions: docs/designs/DESIGN_TOOLS_V1.md

Reviewed by: /office-hours, /plan-eng-review (CLEARED, 4 outside voices), /plan-ceo-review (EXPANSION, 6/6 accepted), /plan-design-review (2/10 → 8/10).

Test plan

bun test passes (skill validation, gen-skill-docs)
Prototype validated: 3 design briefs → real UI mockups
All 13 commands tested manually against OpenAI API
Comparison board generates self-contained HTML with interactive feedback
HTTP serve + feedback loop (serve.test.ts: 11 tests)
Gallery generation (gallery.test.ts: 7 tests)
Feedback roundtrip browser → file on disk (feedback-roundtrip.test.ts)

Documentation

README.md: added /design-shotgun to skills table, /connect-chrome to power tools, both to install instructions
CLAUDE.md: added design-shotgun/, design/, connect-chrome/, extension/, scripts/resolvers/, lib/, docs/designs/ to project structure; added design/dist/ to compiled binaries section
ARCHITECTURE.md: updated iframe support status (shipped), added DESIGN_SETUP and DESIGN_SHOTGUN_LOOP template placeholders
CHANGELOG.md: moved template resolver entries from Added (user-facing) to For contributors

🤖 Generated with Claude Code

Full design doc covering the `design` binary that wraps OpenAI's GPT Image API to generate real UI mockups from gstack's design skills. Includes comparison board UX spec, auth model, 6 CEO expansions (design memory, mockup diffing, screenshot evolution, design intent verification, responsive variants, design-to-code prompt), and 9-commit implementation plan. Reviewed: /office-hours + /plan-eng-review (CLEARED) + /plan-ceo-review (EXPANSION, 6/6 accepted) + /plan-design-review (2/10 → 8/10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prototype script sends 3 design briefs to OpenAI Responses API with image_generation tool. Results: dashboard (47s, 2.1MB), landing page (42s, 1.3MB), settings page (37s, 1.3MB) all produce real, implementable UI mockups with accurate text rendering and clean layouts. Key finding: Codex OAuth tokens lack image generation scopes. Direct API key (sk-proj-*) required, stored in ~/.gstack/openai.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stateless CLI (design/dist/design) wrapping OpenAI Responses API for UI mockup generation. Three working commands: - generate: brief -> PNG mockup via gpt-4o + image_generation tool - check: vision-based quality gate via GPT-4o (text readability, layout completeness, visual coherence) - compare: generates self-contained HTML comparison board with star ratings, radio Pick, per-variant feedback, regenerate controls, and Submit button that writes structured JSON for agent polling Auth reads from ~/.gstack/openai.json (0600), falls back to OPENAI_API_KEY env var. Compiled separately from browse binary (openai added to devDependencies, not runtime deps). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

variants: generates N style variations with staggered parallel (1.5s between launches, exponential backoff on 429). 7 built-in style variations (bold, calm, warm, corporate, dark, playful + default). Tested: 3/3 variants in 41.6s. iterate: multi-turn design iteration using previous_response_id for conversational threading. Falls back to re-generation with accumulated feedback if threading doesn't retain visual context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add generateDesignSetup() and generateDesignMockup() to the existing design.ts resolver file. Add designDir to HostPaths (claude + codex). Register DESIGN_SETUP and DESIGN_MOCKUP in the resolver index. DESIGN_SETUP: $D binary discovery (mirrors $B browse setup pattern). Falls back to DESIGN_SKETCH if binary not available. DESIGN_MOCKUP: full visual exploration workflow template — construct brief from DESIGN.md context, generate 3 variants, open comparison board in Chrome, poll for user feedback, save approved mockup to docs/designs/, generate HTML wireframe for implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was 0.12.0.0. Also adds design binary to build script and dev:design convenience command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add {{DESIGN_MOCKUP}} to office-hours template before the existing {{DESIGN_SKETCH}}. When the design binary is available, /office-hours generates 3 visual mockup variants, opens a comparison board in Chrome, and polls for user feedback. Falls back to HTML wireframes if the design binary isn't built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add {{DESIGN_SETUP}} to pre-review audit and "show me what 10/10 looks like" mockup generation to the 0-10 rating method. When a design dimension rates below 7/10, the review can generate a mockup showing the improved version. Falls back to text descriptions if the design binary isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…N.md New `$D extract` command: sends approved mockup to GPT-4o vision, extracts color palette, typography, spacing, and layout patterns, writes/updates DESIGN.md with an "Extracted Design Language" section. Progressive constraint: if DESIGN.md exists, future mockup briefs include it as style context. If no DESIGN.md, explorations run wide. readDesignConstraints() reads existing DESIGN.md for brief construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New commands: - $D diff --before old.png --after new.png: visual diff using GPT-4o vision. Returns differences by area with severity (high/medium/low) and a matchScore (0-100). - $D verify --mockup approved.png --screenshot live.png: compares live site screenshot against approved design mockup. Pass if matchScore >= 70 and no high-severity differences. Used by /design-review to close the design loop: design -> implement -> verify visually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New command: $D evolve --screenshot current.png --brief "make it calmer" Two-step process: first analyzes the screenshot via GPT-4o vision to produce a detailed description, then generates a new mockup that keeps the existing layout structure but applies the requested changes. Starts from reality, not blank canvas. Bridges the gap between /design-review critique ("the spacing is off") and a visual proposal of the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Responsive variants: $D variants --viewports desktop,tablet,mobile generates mockups at 1536x1024, 1024x1024, and 1024x1536 (portrait) with viewport-appropriate layout instructions. Design-to-code prompt: $D prompt --image approved.png extracts colors, typography, layout, and components via GPT-4o vision, producing a structured implementation prompt. Reads DESIGN.md for additional constraint context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-27T04:36:06Z

E2E Evals: ✅ PASS

63/63 tests passed | $6.03 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	9/9	✅	$0.42
e2e-deploy	6/6	✅	$0.98
e2e-design	3/3	✅	$0.46
e2e-plan	7/7	✅	$1.04
e2e-qa-workflow	3/3	✅	$1.01
e2e-review	6/6	✅	$1.18
e2e-workflow	4/4	✅	$0.44
llm-judge	25/25	✅	$0.5

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

issela-makes · 2026-03-27T05:04:33Z

The comparison board approach is really smart — letting users rate and pick favorites directly in the browser eliminates the usual back-and-forth of screenshot sharing. One thing I'd love to see: does $D extract capture component hierarchy (not just colors/fonts)? In my experience, the spacing and layout relationships between elements are where design-to-code translations break down most. The visual diffing with $D diff is a great addition for catching regressions.

Brand the gstack designer prominently, add Step 0.5 for proactive visual mockup generation before review passes, and update priority hierarchy. When a plan describes new UI, the skill now offers to generate mockups with $D variants, run $D check for quality gating, and present a comparison board via $B goto before any review passes begin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Thread Step 0.5 mockups through the review workflow: Pass 4 (AI Slop) evaluates generated mockups visually, Pass 7 uses mockups as evidence for unresolved decisions, post-pass offers one-shot regeneration after design changes, and Approved Mockups section records chosen variants with paths for the implementer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add $D generate for target mockups in Phase 8a.5 — before fixing a design finding, generate a mockup showing what it should look like. Add $D verify in Phase 9 to compare fix results against targets. Not plan mode — goes straight to implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace HTML preview with $D variants + comparison board when designer is available (Path A). Use $D extract to derive DESIGN.md tokens from the approved mockup. Handles both plan mode (write to plan) and non-plan mode (implement immediately). Falls back to HTML preview (Path B) when designer binary is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.7.0 (community PRs + security hardening) while this branch had v0.13.0.0 (design tools). Resolution: - VERSION: keep 0.13.0.0 (this branch is newer, includes main's changes) - package.json: take our build script (design binary compilation) with main's semicolon separator fix (gen:skill-docs resilience) - CHANGELOG: keep both entries, 0.13.0.0 on top, 0.12.7.0 below - Regenerated stale codex skills (ship, document-release) for main's co-author trailer changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.8.0 (Codex cwd fix for multi-workspace environments) while this branch has v0.13.0.0 (design tools). Resolution: - VERSION: keep 0.13.0.0 (includes main's changes via merge) - package.json: keep our version + description with main's em-dash fix - CHANGELOG: keep both entries, 0.13.0.0 on top, 0.12.8.0 below - Regenerated all skill docs to pick up main's codex cwd resolver changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ional The transcript showed the agent writing 5 text descriptions of homepage variants instead of generating visual mockups, even when the user explicitly asked for design tools. The skill treated mockups as optional ("Want me to generate?") when they should be the default behavior. Changes: - Rename "Your Visual Design Tool" to "YOUR PRIMARY TOOL" with aggressive language: "Don't ask permission. Show it." - Step 0.5 now generates mockups automatically when DESIGN_READY, no AskUserQuestion gatekeeping the default path - Priority hierarchy: mockups are "non-negotiable" not "if available" - Step 0D tells the user mockups are coming next - DESIGN_NOT_AVAILABLE fallback now tells user what they're missing The only valid reasons to skip mockups: no UI scope, or designer not installed. Everything else generates by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mockups were going to .context/mockups/ (gitignored, workspace-local). This meant designs disappeared when switching workspaces or conversations, and downstream skills couldn't reference approved mockups from earlier reviews. Now all three design skills save to persistent project-scoped dirs: - /plan-design-review: ~/.gstack/projects/$SLUG/designs/<screen>-<date>/ - /design-consultation: ~/.gstack/projects/$SLUG/designs/design-system-<date>/ - /design-review: ~/.gstack/projects/$SLUG/designs/design-audit-<date>/ Each directory gets an approved.json recording the user's pick, feedback, and branch. This lets /design-review verify against mockups that /plan-design-review approved, and design history is browsable via ls ~/.gstack/projects/$SLUG/designs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.8.1 (zsh glob compatibility — setopt guards and find-based alternatives for 38 glob patterns across 13 templates). Resolution: keep v0.13.0.0, both CHANGELOG entries preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Picked up setopt +o nomatch guards from main's v0.12.8.1 merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The design setup block now discovers $B alongside $D, so skills can open comparison boards via $B goto and poll feedback via $B eval. Falls back to `open` on macOS when browse binary is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

After opening the comparison board, the agent now polls #status via $B eval instead of asking a rigid AskUserQuestion. Handles submit (read structured JSON feedback), regenerate (new variants with updated brief), and $B-unavailable fallback (free-form text response). The user interacts with the real board UI, not a constrained option picker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

16 tests covering the full DOM polling cycle: structure verification, submit with pick/rating/comment, regenerate flows (totally different, more like this, custom text), and the agent polling pattern (empty → submitted → read JSON). Uses real generateCompareHtml() from design/src/compare.ts, served via HTTP. Runs in <1s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The comparison board feedback loop was fundamentally broken: browse blocks file:// URLs (url-validation.ts:71), so $B goto file://board.html always fails. The fallback open + $B eval polls a different browser instance. $D serve fixes this by serving the board over HTTP on localhost. The server is stateful: stays alive across regeneration rounds, exposes /api/progress for the board to poll, and accepts /api/reload from the agent to swap in new board HTML. Stdout carries feedback JSON only; stderr carries telemetry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When __GSTACK_SERVER_URL is set (injected by $D serve), the board POSTs feedback to the server instead of only writing to hidden DOM elements. After submit: disables all inputs, shows "Return to your coding agent." After regenerate: shows spinner, polls /api/progress, auto-refreshes on ready. On POST failure: shows copyable JSON fallback. On progress timeout (5 min): shows error with /design-shotgun prompt. DOM fallback preserved for headed browser mode and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

11 tests covering: HTML serving with injected server URL, /api/progress state reporting, submit → done lifecycle, regenerate → regenerating state, remix with remixSpec, malformed JSON rejection, /api/reload HTML swapping, missing file validation, and full regenerate → reload → submit round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds generateDesignShotgunLoop() resolver for the shared comparison board feedback loop (serve via HTTP, handle regenerate/remix, AskUserQuestion fallback, feedback confirmation). Registered as {{DESIGN_SHOTGUN_LOOP}}. Fixes generateDesignMockup() to use ~/.gstack/projects/$SLUG/designs/ instead of /tmp/ and docs/designs/. Replaces broken $B goto file:// + $B eval polling with $D compare --serve (HTTP-based, stdout feedback). Adds CRITICAL PATH RULE guardrail to DESIGN_SETUP: design artifacts must go to ~/.gstack/projects/$SLUG/designs/, never .context/ or /tmp/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New skill for visual brainstorming: generate AI design variants, open a comparison board in the user's browser, collect structured feedback, and iterate. Features: session detection (revisit prior explorations), 5-dimension context gathering (who, job to be done, what exists, user flow, edge cases), taste memory (prior approved designs bias new generations), inline variant preview, configurable variant count, screenshot-to-variants via $D evolve. Uses {{DESIGN_SHOTGUN_LOOP}} resolver for the feedback loop. Saves all artifacts to ~/.gstack/projects/$SLUG/designs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Per-variant element selectors (Layout, Colors, Typography, Spacing) with radio buttons in a grid. Remix button collects selections into a remixSpec object and sends via the same HTTP POST feedback mechanism. Enabled only when at least one element is selected. Board shows regenerating spinner while agent generates the hybrid variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Generates a self-contained HTML page showing all prior design explorations for a project: every variant (approved or not), feedback notes, organized by date (newest first). Images embedded as base64. Handles corrupted approved.json gracefully (skips, still shows the session). Empty state shows "No history yet" with /design-shotgun prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

7 tests: empty dir, nonexistent dir, single session with approved variant, multiple sessions sorted newest-first, corrupted approved.json handled gracefully, session without approved.json, self-contained HTML (no external dependencies). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

plan-design-review and design-consultation templates previously used $B goto file:// + $B eval polling for the comparison board feedback loop. This was broken (browse blocks file:// URLs). Both templates now use {{DESIGN_SHOTGUN_LOOP}} which serves via HTTP, handles regeneration in the same browser tab, and falls back to AskUserQuestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

design-shotgun-path (gate): verify artifacts go to ~/.gstack/, not .context/ design-shotgun-session (gate): verify repeat-run detection + AskUserQuestion design-shotgun-full (periodic): full round-trip with real design binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.9.0 (community PRs: faster install, skill namespacing, uninstall). Our branch keeps v0.13.0.0 with both changelog entries. Updated v0.13.0.0 changelog to include design-shotgun, $D serve, $D gallery, remix UI, and {{DESIGN_SHOTGUN_LOOP}} resolver. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.10.0 (Codex filesystem boundary fix: prevents Codex from wandering into skill definition files). Our branch keeps v0.13.0.0. Both changelog entries preserved in order: 0.13.0.0 > 0.12.10.0 > 0.12.9.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main shipped v0.12.11.0 (skill prefix is now a persistent user choice). Our branch keeps v0.13.0.0. Both changelog entries preserved in order: 0.13.0.0 > 0.12.11.0 > 0.12.10.0 > 0.12.9.0. Also cleaned up stale conflict markers from a prior merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ation, grid view Three changes to the design comparison board: 1. Pick confirmation: selecting "Pick" on Option A shows "We'll move forward with Option A" in green, plus a status line above the submit button repeating the choice. 2. Clear option headers: each variant now has "Option A" in bold with a subtitle above the image, instead of just the raw image. 3. View toggle: top-right Large/Grid buttons switch between single-column (default) and 3-across grid view. Also restructured the bottom section into a 2-column grid: submit/overall feedback on the left, regenerate controls on the right. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Avoids DNS resolution issues on some systems where localhost may resolve to IPv6 ::1 while Bun listens on IPv4 only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The agent backgrounds $D serve (Claude Code can't block on a subprocess and do other work simultaneously). With stdout-only feedback delivery, the agent never sees regenerate/remix feedback. Fix: write feedback-pending.json (regenerate/remix) and feedback.json (submit) to disk next to the board HTML. Agent polls the filesystem instead of reading stdout. Both channels (stdout + disk) are always active so foreground mode still works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update the template resolver to instruct the agent to background $D serve and poll for feedback-pending.json / feedback.json on a 5-second loop. This matches the real-world pattern where Claude Code / Conductor agents can't block on subprocess stdout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The user's layout restructure renamed .regenerate-bar → .regen-column, .submit-bar → .submit-column, and .overall-section → .bottom-section. The JS still referenced the old class names, causing querySelector to return null and showPostSubmitState() / showRegeneratingState() to silently crash. This meant Submit and Regenerate buttons appeared to work (DOM elements updated, HTTP POST succeeded) but the visual feedback (disabled inputs, spinner, success message) never appeared. Fix: use fallback selectors that check both old and new class names, with null guards so a missing element doesn't crash the function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test that proves "changes on the website propagate to Claude Code." Opens the comparison board in a real headless browser with __GSTACK_SERVER_URL injected, simulates user clicks (Submit, Regenerate, More Like This), and verifies that feedback.json / feedback-pending.json land on disk with the correct structured data. 6 tests covering: submit → feedback.json, post-submit UI lockdown, regenerate → feedback-pending.json, more-like-this → feedback-pending.json, regenerate spinner display, and full regen → reload → submit round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Documents the full browser-to-agent feedback architecture: state machine, file-based polling, port discovery, post-submit lifecycle, and every known edge case (zombie forms, dead servers, stale spinners, file:// bug, double-click races, port coordination, sequential generate rule). Includes ASCII diagrams of the data flow and state transitions, complete step-by-step walkthrough of happy path and regeneration path, test coverage map with gaps, and short/medium/long-term improvement ideas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Four fixes to prevent agents from reinventing the feedback loop badly: 1. Sequential generate rule: explicit instruction that $D generate calls must run one at a time (API rate-limits concurrent image generation). 2. No-AskUserQuestion-for-feedback rule: agent reads feedback.json instead of re-asking what the user picked. 3. Remove file:// references: $B goto file:// was always rejected by url-validation.ts. The --serve flag handles everything. 4. Remove $B eval polling reference: no longer needed with HTTP POST. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…on, timing estimate Three production UX bugs fixed: 1. Dead air — now shows timing estimate before generation starts 2. Silent variant drop — replaced $D variants batch with individual $D generate calls, each verified for existence and non-zero size with retry 3. No progressive reveal — each variant shown inline via Read tool immediately after generation (~60s increments instead of all at ~180s) Also: /tmp/ then cp as default output pattern (sandbox workaround), screenshot taken once for evolve path (not per-variant). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Step 3 rewritten to concept-first + parallel Agent architecture: - 3a: generate text concepts (free, instant) - 3b: AskUserQuestion to confirm/modify before spending API credits - 3c: launch N Agent subagents in parallel (~60s total regardless of count) - 3d: show all results, dynamic image list for comparison board Adds Agent to allowed-tools. Softens plan-design-review sequential warning to note design-shotgun uses parallel at Tier 2+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main added v0.12.12.0 (Security Audit Compliance). This branch has v0.13.0.0 (design tools). Keep both entries with v0.13.0.0 on top. VERSION stays at 0.13.0.0 (higher). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These files were committed despite .agents/ being in .gitignore. They regenerate from ./setup --host codex on any machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nges Merge from main brought updated preamble resolver (conditional telemetry, local JSONL logging) but design-shotgun/SKILL.md wasn't regenerated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 14 commits March 26, 2026 21:26

fix: sync package.json version with VERSION file (0.12.2.0)

d037e54

Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was 0.12.0.0. Also adds design binary to build script and dev:design convenience command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

merge: resolve package.json version conflict (take main 0.12.5.0)

6413175

chore: bump version and changelog (v0.13.0.0)

859f948

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 14 commits March 26, 2026 23:15

chore: regenerate codex ship skill with zsh glob guards

2d4af18

Picked up setopt +o nomatch guards from main's v0.12.8.1 merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 29 commits March 27, 2026 08:14

chore: regenerate SKILL.md files for design-shotgun + resolver changes

fd67195

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: regenerate SKILL.md files for template refactor

1c62a6a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use 127.0.0.1 instead of localhost for serve URL

229db44

Avoids DNS resolution issues on some systems where localhost may resolve to IPv6 ::1 while Bun listens on IPv4 only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: regenerate SKILL.md files for file-polling feedback loop

1fbf6c4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update project documentation for v0.13.0.0

147059f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: untrack .agents/skills/ — generated at setup, already gitignored

065a6c6

These files were committed despite .agents/ being in .gitignore. They regenerate from ./setup --host codex on any machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan merged commit 78bc1d1 into main Mar 28, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0)#551

feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0)#551
garrytan merged 57 commits intomainfrom
garrytan/agent-design-tools

garrytan commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

issela-makes commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design doc

Test plan

Documentation

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Evals: ✅ PASS

Uh oh!

issela-makes commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 27, 2026 •

edited

Loading