Scribe Platform Architecture

Last updated: 2026-03-09

System Overview

graph TD
    User("🌐 User Browser")

    User -->|"Sign in and access dashboard"| AppPlatform

    AppPlatform -->|"Write structured job request"| JobsCollection
    AppPlatform -->|"Read and write site config"| SitesUsers
    AppPlatform -.->|"Poll article status every 2.5s"| ArticlesCollection
    ArticlesCollection -.->|"Serve published content"| ContentSite

    subgraph MongoDB["🗄️ MongoDB Atlas"]
        JobsCollection("Jobs Collection")
        ArticlesCollection("Articles Collection")
        SitesUsers("Sites and Users Collections")
    end

    subgraph OpenClaw["🤖 OpenClaw Engine"]
        ScribeWalker("Scribe Walker, Agent Orchestration")
        ClaudeOpus("🧠 Claude Opus 4.6")
        DallE("🎨 DALL-E 3")
        WebResearch("🔍 Web Research")
    end

    JobsCollection -->|"Poll and pick up pending jobs"| ScribeWalker

    subgraph Vercel["☁️ Vercel Platform (Single Project)"]
        ContentSite("tryscribe.co — Marketing + Blog Subfolders")
        AppPlatform("app.tryscribe.co — Dashboard + API")
    end

    ScribeWalker -->|"Generate SEO articles"| ClaudeOpus
    ScribeWalker -->|"Generate featured images"| DallE
    ScribeWalker -->|"Research trending topics"| WebResearch
    ScribeWalker -->|"Write completed articles"| ArticlesCollection

    style User fill:#b3e5fc,stroke:#333,stroke-width:3px,color:#000
    style Vercel fill:#fff8f0,stroke:#d4920a,stroke-width:2px,color:#000
    style AppPlatform fill:#fff,stroke:#d4920a,stroke-width:3px,color:#000
    style ContentSite fill:#fff,stroke:#7a8c6e,stroke-width:2px,color:#000
    style MongoDB fill:#e8f5e9,stroke:#4caf50,stroke-width:2px,color:#000
    style JobsCollection fill:#c8e6c9,stroke:#333,stroke-width:2px,color:#000
    style ArticlesCollection fill:#c8e6c9,stroke:#333,stroke-width:2px,color:#000
    style SitesUsers fill:#c8e6c9,stroke:#333,stroke-width:2px,color:#000
    style OpenClaw fill:#fff3e0,stroke:#d4920a,stroke-width:2px,color:#000
    style ScribeWalker fill:#ffe0b2,stroke:#d4920a,stroke-width:3px,color:#000
    style ClaudeOpus fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#000
    style DallE fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#000
    style WebResearch fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#000

    linkStyle default interpolate basis

Flow: User signs in → App writes structured job → OpenClaw polls and picks it up → Scribe Walker generates articles autonomously → Dashboard polls and displays results in real-time

Color Legend: 🔵 Light blue = User entry point · 🟠 Amber = Vercel platform · 🟢 Green = MongoDB data layer · 🟡 Warm = OpenClaw engine · ⚪ Gray = AI tools

Core Principle: MongoDB is the ONLY bridge between the app and OpenClaw. They never communicate directly. The app writes structured job requests; OpenClaw picks them up and executes autonomously.

Components

1. App (Next.js on Vercel)

Content URL: tryscribe.co — marketing site + blog subfolders
Dashboard URL: app.tryscribe.co — auth, onboarding, dashboard, billing
Deployment: Single Vercel project serves both domains
Role: User-facing platform — auth, onboarding, dashboard, billing, blog content
Responsibilities:
- User authentication (NextAuth: magic link + Google OAuth)
- Onboarding flow (brand name, niche, location)
- Writing job requests to MongoDB
- Polling article status and displaying results
- Serving blog content via subfolders (tryscribe.co/{brand}/{slug})
- Marketing site at root (tryscribe.co/)
- Stripe billing and usage tracking
Does NOT: Generate articles, call AI APIs, run any agent logic

URL Architecture (Subfolder Model — Migrated Mar 9, 2026)

Why subfolders over subdomains:

tryscribe.co is a new domain with zero authority
Every article under tryscribe.co/{brand}/ consolidates keyword and backlink authority on the root domain
Subdomains (brand.tryscribe.co) would scatter SEO value across isolated domains
Sources: Cloudflare, Ahrefs, Semrush all lean subfolder for new domains

URL Structure:

URL	Purpose
`tryscribe.co/`	Marketing site (static HTML served via middleware rewrite)
`tryscribe.co/{brand}/`	Brand blog home (e.g., `tryscribe.co/sallys-spa`)
`tryscribe.co/{brand}/{slug}`	Individual article page
`app.tryscribe.co/dashboard`	User dashboard
`app.tryscribe.co/onboarding`	New user onboarding

Middleware Routing (src/middleware.ts):

Legacy subdomain requests (brand.tryscribe.co) → 301 redirect to tryscribe.co/{brand}/
App routes on content domain (tryscribe.co/dashboard) → 302 redirect to app.tryscribe.co/dashboard
Root path on content domain (tryscribe.co/) → rewrite to /marketing.html
Brand slug detection → rewrite /{brand}/{slug} to internal /blog/{brand}/{slug} route
Reserved paths (api, auth, _next, etc.) pass through unchanged

Internal Route Structure: Blog pages live at src/app/blog/[subdomain]/ internally (the subdomain param name is kept for backward compatibility but represents the brand slug in the subfolder URL).

Domain Separation:

tryscribe.co = content site only (marketing + blog articles). Dashboard routes redirect to app.tryscribe.co.
app.tryscribe.co = dashboard + API. Content updates here don't risk breaking the content site.
Both served from the same Vercel project with middleware-based routing.

2. MongoDB Atlas

Cluster: ScribeCluster (currently M0 Free, AWS us-east-1)
Role: Shared data layer and job queue
Collections:
- users — user accounts, plans, referrals
- sites — brand configurations (niche, location, subdomain)
- articles — generated content (status: generating/published/failed)
- jobs — article generation job queue (NEW)
- sessions, accounts — NextAuth session management

3. OpenClaw Instance (Scribe Walker)

Current host: Mac Mini (local development)
Future host: Linux VM (production)
Role: AI orchestration engine — the brains
Responsibilities:
- Polling jobs collection for pending work
- Running Scribe Walker agent sessions for each job
- Research, writing, image generation, quality checks
- Writing completed articles to articles collection
- Updating job status (pending → processing → complete/failed)
Does NOT: Serve web traffic, handle user auth, manage billing

Job Queue Protocol

Job Schema

interface Job {
  _id: ObjectId;
  
  // Who requested it
  userId: ObjectId;
  siteId: ObjectId;
  
  // What to generate
  action: "generate";          // Typed enum — no freeform actions
  params: {
    brandName: string;         // From site record
    niche: string;             // From site record
    location?: string;         // From site seoConfig
    tone?: string;             // "professional" | "casual" | "authoritative"
    count: number;             // Number of articles (default: 3)
    topicStyles: string[];     // ["how-to", "tips", "why", "listicle", "guide"]
  };

  // Job lifecycle
  status: "pending" | "processing" | "complete" | "failed";
  priority: number;            // Lower = higher priority (default: 10)
  attempts: number;            // Retry count (default: 0)
  maxAttempts: number;         // Max retries (default: 3)
  
  // Results
  articleIds: ObjectId[];      // Populated as articles are created
  error?: string;              // Error message if failed
  
  // Timestamps
  createdAt: Date;
  startedAt?: Date;
  completedAt?: Date;
}

Allowed Actions (Typed Enum)

Only these actions are valid. OpenClaw rejects anything else:

Action	Description	Params
`generate`	Generate new articles for a site	brandName, niche, location, tone, count, topicStyles
`rewrite`	Rewrite an existing article	articleId, instructions (from predefined set)
`refresh`	Generate more articles for existing site	same as generate

No freeform prompts. No shell commands. No tool instructions. The job contains DATA, not INSTRUCTIONS. OpenClaw constructs its own prompts internally using SCRIBE-WALKER-CONTEXT.md and its agent reasoning.

Job Lifecycle Flow

1. User clicks "Summon Your Scribe ✒️"
2. App validates user auth + plan limits
3. App creates Job doc (status: "pending")
4. App creates placeholder Article docs (status: "generating")
5. App returns immediately — dashboard starts polling articles

6. OpenClaw polls jobs collection (every 5-10 seconds)
7. Picks up pending job, sets status: "processing", sets startedAt
8. Spawns Scribe Walker session with structured params
9. Scribe Walker:
   a. Researches relevant topics for the niche/location
   b. Writes SEO-optimized articles (Claude Opus 4.6)
   c. Generates DALL-E 3 featured images
   d. Quality checks (word count, SEO meta, no em dashes, etc.)
10. Updates Article docs: content, SEO meta, images, status: "published"
11. Updates Job doc: status: "complete", completedAt

12. Dashboard polling picks up published articles in real-time
13. User sees articles appear one by one (2-3 second poll interval)

Security Model

Threat: Compromised MongoDB Credentials

If an attacker gains access to the app's MongoDB connection string, they could write malicious jobs.

Mitigation 1: Strict Schema Validation

OpenClaw validates every job against the typed schema before processing:

action must be in the allowed enum
params must match the expected shape for that action
All string fields have max length limits
No nested objects beyond one level
Any invalid job is rejected and logged as a security event

Mitigation 2: No Prompt Passthrough

The job never contains prompts, instructions, or commands for the agent. OpenClaw uses the structured data fields (brandName, niche, location) to fill in its OWN hardcoded workflow. The agent's behavior is defined by SCRIBE-WALKER-CONTEXT.md, not by job data.

Think of it as: generateArticles(niche="plumbing", location="Salt Lake City") — a function call with typed parameters.

Mitigation 3: Separate DB Users

App DB user: Write access to jobs only. Read access to articles, sites, users. No access to system collections.
OpenClaw DB user: Full access to jobs, articles. Read access to sites, users.
Even with compromised app credentials, attacker cannot modify articles or users directly.

Mitigation 4: Rate Limiting

Per-user: Max 5 jobs per hour (configurable per plan)
Global: Max 20 concurrent processing jobs
Retry cap: Max 3 attempts per job, then permanent failure
Enforced at both app level (before writing) and OpenClaw level (before processing)

Mitigation 5: Job Signing (Phase 2)

App signs each job with HMAC-SHA256 using a shared secret
signature = HMAC(jobId + siteId + action + timestamp, SECRET)
OpenClaw verifies signature before processing
Unsigned or invalid-signature jobs are rejected
Protects against direct DB manipulation even with full DB access

Additional Security

OpenClaw instance is NOT publicly accessible — no open ports, no API endpoints
Only outbound connections: OpenClaw connects TO MongoDB, Anthropic, OpenAI. Nothing connects TO OpenClaw.
Job expiry: Jobs older than 1 hour auto-expire (prevents queue poisoning)
Audit log: All job state transitions logged with timestamps

Scaling Path

Phase 1: Local Mac Mini (Current — MVP/Beta)

Single OpenClaw instance on Taha's Mac Mini
Handles 10 beta users easily
Scribe Walker already proven (1,285+ articles)
Limitation: tied to local machine uptime

Phase 2: AWS EC2 (Production Launch — In Progress)

AWS EC2 t3.small (us-east-1), Ubuntu 24.04 LTS
Scribe Walker as main OpenClaw agent (not sub-agent)
OpenClaw gateway service (systemd, loopback)
Prompt intelligence in worker/prompts/*.js (version-controlled)
Worker routing (#67) for parallel testing with Mac Mini
Cost: ~$20/mo (t3.small)

Phase 3: Multi-Instance (Scale)

Multiple OpenClaw instances polling the same job queue
MongoDB's findOneAndUpdate with atomic status transitions prevents double-processing
Each instance picks up different jobs — natural load balancing
Can scale horizontally by adding VMs
Trigger: when single instance can't keep up with job volume

Why Not Mac VM?

Mac VMs are expensive ($100-200+/mo via MacStadium/AWS)
Scribe's article generation doesn't need macOS-specific features
No iMessage, no Apple Contacts, no macOS UI automation needed
Linux gives us everything: Node.js, headless browser, API access
Decision: Linux VM for production

Scribe Walker Integration

What Makes Scribe Walker Output Great

The quality comes from the agentic orchestration, not just the model:

Research phase — Agent browses web, checks trends, finds angles
Topic differentiation — Checks existing articles to avoid duplicates
Writing with reasoning — Claude Opus reasons about structure, SEO, audience
Image matching — Agent crafts DALL-E prompts that specifically match article content
Quality gate — Self-checks word count, SEO meta completeness, no banned patterns
Context awareness — Uses SCRIBE-WALKER-CONTEXT.md for consistent style/rules

Replicating for Multi-Tenant

Each job spawns an isolated Scribe Walker session (sub-agent)
Session receives: brand context (name, niche, location) + SCRIBE-WALKER-CONTEXT.md base rules
Sessions are isolated — one user's generation doesn't affect another's
Single agent, multiple sessions — not multi-agent (simpler, sufficient for MVP)

Context Transfer Checklist (for VM migration)

seo/SCRIBE-WALKER-CONTEXT.md — writing rules, quality gates, image procedures
OpenClaw config (openclaw.json) — agent settings, auth profiles
Anthropic auth (setup-token or API key)
OpenAI API key (for DALL-E)
MongoDB connection string
Any learned patterns from memory/ files relevant to article quality

Open Questions

Polling interval: How often should OpenClaw check for new jobs? 5s? 10s? Webhook-triggered?
Article count per plan: Free tier gets 10+5/mo — do we enforce this at app level, OpenClaw level, or both?
Concurrent generation: Should we limit to 1 job at a time per instance, or allow parallel sessions?
Error handling UX: What does the user see if generation fails? Auto-retry? Manual retry button?
~~Image storage:~~ Resolved — Vercel Blob CDN with sharp JPEG Q85 compression (#51)
~~Research depth:~~ Resolved — Tiered: Free = evergreen only, Pro = seasonal, Scale = web research
~~Subdomain SSL:~~ Resolved — Migrated to subfolder model (Mar 9, 2026). No wildcard certs needed.

Decision Log

Date	Decision	Rationale
2026-03-04	MongoDB as job queue (not REST API)	Decoupled, no direct access to OpenClaw, easier to scale
2026-03-04	Typed job schema, no prompt passthrough	Security — prevents command injection via DB
2026-03-04	Linux VM over Mac VM for production	Cheaper, sufficient features, Scribe doesn't need macOS
2026-03-04	Single agent, multiple sessions	Simpler than multi-agent, sufficient for MVP scale
2026-03-04	Claude Opus 4.6 for all tiers	Quality first, cost modeling later
2026-03-04	DALL-E 3 for featured images	Proven quality from 1,285+ articles on tahaabbasi.com
2026-03-07	Scribe Walker as main agent on EC2	Full OpenClaw lifecycle (compaction, hooks, model updates) without custom plumbing
2026-03-07	Migration + evolution, not 1:1 copy	Mac prototype proven; EC2 must incorporate all quality intelligence patterns
2026-03-07	Prompt modules as centralized intelligence	`worker/prompts/*.js` = single source of truth for quality rules across article gen + image regen
2026-03-07	System-level systemd for gateway	`openclaw gateway install` fails over SSH; manual unit file matches Hetzner docs pattern
2026-03-07	80% evergreen / 20% seasonal-timely	Evergreen is the backbone for local business SEO; trending = "relevant now" not news slop
2026-03-07	Exact/near-exact dedup only (no contextual)	Contextual dedup backfires — businesses WANT multiple articles on same topic from different angles
2026-03-07	Dedup window scales with plan	Free=all(15), Pro=all(50), Scale=last 100, Agency=configurable
2026-03-07	Rename Business tier to Scale (#64)	Better name for the 150 articles/mo tier
2026-03-07	Services field for Pro+ only (#65)	Free stays frictionless; Pro+ gets targeted articles via services list
2026-03-07	Prompt modules stay in JS files	Security > hot-reload. Deploy = git pull + restart. No DB-stored prompts.
2026-03-07	Stateless agent (no persistent memory)	Consistent with proven Mac Mini pattern. Each job independent.
2026-03-07	Worker routing for migration testing (#67)	Default = EC2, `?worker=local` = Mac Mini. Temporary.
2026-03-07	IndexNow with dev mode gate (#66)	Submit for subdomains, defer custom domains, never submit in dev/test
2026-03-09	Subfolder model over subdomains (#80)	New domain needs consolidated SEO authority; every article under tryscribe.co/{brand}/ strengthens root domain
2026-03-09	Separate content site from dashboard	tryscribe.co = content + marketing, app.tryscribe.co = dashboard. Same Vercel project, middleware-separated. Dashboard deploys don't risk content site.
2026-03-09	No 301 redirects for old subdomains (test data)	Test sites only, no shared links exist. Legacy subdomain middleware handles any stray hits with 301.

Scribe Walker Agent Architecture

Added: 2026-03-07 — Documents the evolution from prototype to production

Origins: Mac Mini Prototype

The Scribe Walker concept was proven on Taha Abbasi's Mac Mini, where it operated as a sub-agent within the "Walker Posse" — a family of specialized agents orchestrated by Benny J Walker (the primary OpenClaw agent).

How it worked on Mac Mini:

Benny (main agent) ran cron jobs that spawned ephemeral Scribe Walker sessions
Each session received a task message + the full seo/SCRIBE-WALKER-CONTEXT.md (~700 lines)
The session wrote articles, published them, and terminated
Benny's own agent backing (SOUL.md, MEMORY.md, identity, reliability patterns) provided implicit quality
OpenClaw managed session lifecycle, compaction, error handling

What made it effective (proven over 1,285+ articles on tahaabbasi.com):

Capability	How It Worked	Why It Mattered
Quality gates	Word count enforcement (1000+ min), pre-publish checklist, self-review	Prevented thin/low-quality content from going live
Duplicate prevention	Last 100 titles checked contextually (not just exact slug match)	Avoided writing "Why Microneedling Works" 4 articles apart
Brand SEO integration	Brand in title, first paragraph, 3-5x naturally, CTAs, author bio	Core product value — what makes Scribe different from generic AI
Topic research	Industry awareness, seasonal relevance, niche-specific trends	Timely articles supplement strong evergreen foundation
Image-topic matching	DALL-E prompts crafted to match specific article content, not generic	Featured images that actually represent the article topic
Content restrictions	Configurable no-go list (topics already published, off-brand content)	Prevented brand damage and redundancy
Writing style enforcement	No em dashes, no "crucial"/"utilize", varied sentence length, human voice	Articles read as human-written, not AI-generated
Readability & engagement	Conversational tone, relatable scenarios, questions for flow, white space	Readers actually finish articles, not bounce
Source attribution	All claims linked to credible sources, original synthesis required	SEO authority, no plagiarism risk
CTA structure	Every article ends with warm, varied call-to-action	Drives business for the brand

Evolution: EC2 Production Architecture

The EC2 deployment is NOT a 1:1 migration. It evolves the prototype into a multi-tenant product where Scribe Walker is the main agent on its own dedicated server.

Key Architectural Shift

MAC MINI (Prototype):
  Benny (main) → spawns ephemeral Scribe Walker → single brand (Taha)

EC2 (Production):
  Scribe Walker (main) → spawns article sessions → any brand (multi-tenant)

Scribe Walker on EC2 is equivalent to what Benny is on the Mac Mini — the primary agent with full OpenClaw capabilities: identity, memory, session management, compaction, hooks, model updates.

Why Main Agent (Not Sub-Agent)

Benefit	Description
Full OpenClaw lifecycle	Compaction, session memory, command logging — all built-in
Model updates for free	New Claude/OpenAI models = `openclaw onboard` update, no code changes
Security updates	OpenClaw security patches apply directly
Monitoring	`openclaw health`, `openclaw status`, gateway dashboard
Identity persistence	SOUL.md, AGENTS.md define consistent behavior across all sessions
Hook system	command-logger for diagnostics, session-memory for compaction resilience

Component Architecture

graph TD
    subgraph EC2["🖥️ AWS EC2 (t3.small, us-east-1)"]
        subgraph SystemD["systemd Services"]
            GW["openclaw-gateway.service"]
            WK["scribe-worker.service (Scroll Worker)"]
        end

        subgraph OpenClaw["🤖 OpenClaw Gateway"]
            MainAgent["Scribe Walker (main agent)"]
            SOUL["SOUL.md — Identity & Principles"]
            AGENTS["AGENTS.md — Security & Operations"]
            Hooks["Hooks: command-logger, session-memory"]

            MainAgent --> ArticleSession1["Article Session (Brand A)"]
            MainAgent --> ArticleSession2["Article Session (Brand B)"]
            MainAgent --> ArticleSession3["Article Session (Brand C)"]
        end

        subgraph Worker["📜 Scroll Worker (job-worker.js)"]
            Poller["MongoDB Poller"]
            PromptBuilder["buildScribePrompt()"]
            Modules["Prompt Modules"]
        end

        WK --> Worker
        GW --> OpenClaw
        Poller -->|"openclaw agent --agent main"| MainAgent
        PromptBuilder --> Modules
    end

    subgraph PromptModules["📝 Prompt Intelligence (worker/prompts/)"]
        AW["article-writing.js — Orchestration"]
        QR["quality-rules.js — Quality gates, readability, CTA"]
        DI["dalle-image.js — Image generation rules"]
        TG["tags.js — Standard tag taxonomy"]
    end

    subgraph MongoDB["🗄️ MongoDB Atlas"]
        Jobs["jobs collection"]
        Articles["articles collection"]
        Sites["sites collection — brand config"]
    end

    subgraph External["🌐 External APIs"]
        Claude["Claude Opus 4.6"]
        DallE["DALL-E 3"]
        WebSearch["Web Search (topic research)"]
    end

    Poller -->|"poll pending jobs"| Jobs
    Sites -->|"brand, niche, location, tone, demographics"| PromptBuilder
    PromptBuilder -->|"assembled prompt"| Poller
    ArticleSession1 --> Claude
    ArticleSession1 --> DallE
    ArticleSession1 --> WebSearch
    ArticleSession1 -->|"write completed articles"| Articles
    Modules --> PromptModules

Intelligence Layers

The Scribe Walker's article-writing intelligence is distributed across four layers:

Layer 1: Agent Identity (OpenClaw Workspace)

Files in the agent's workspace directory that define WHO the agent is:

File	Purpose
`SOUL.md`	Core identity, principles, writing philosophy
`AGENTS.md`	Security rules, operational boundaries, allowed/disallowed actions
`IDENTITY.md`	Name, role, platform context
`TOOLS.md`	Environment details, available tools

These are loaded by OpenClaw for every session. They provide the persistent "personality" and guardrails.

Layer 2: Prompt Modules (Code — `worker/prompts/`)

Centralized, version-controlled prompt components assembled per-job:

Module	What It Contains	Used By
`article-writing.js`	Main orchestration prompt, workflow, MongoDB instructions	Article generation
`quality-rules.js`	Word count, readability, engagement rules, CTA format, brand SEO	Article generation, regeneration
`dalle-image.js`	Image style rules, demographic matching, size/format requirements	Article generation, image regeneration
`tags.js`	Standard tag taxonomy	Article generation

Key design: These modules are the single source of truth for quality rules. Both article generation and image regeneration call the same functions, ensuring consistency.

Layer 3: Site Configuration (MongoDB)

Per-customer data that customizes each job:

interface SiteConfig {
  brandName: string;           // "Sally's Spa"
  niche: string;               // "Med Spa"
  location?: string;           // "Daybreak, South Jordan, UT"
  tone?: string;               // "professional" | "casual" | "authoritative"
  topicStyles: string[];       // ["how-to", "tips", "why"]
  website?: string;            // "https://sallysspa.com"
  socials?: {                  // Social media links for CTAs
    facebook?: string;
    instagram?: string;
    x?: string;
  };
  demographicProfile?: {       // For image generation demographic matching
    primaryDemo: string;       // "caucasian women"
    diversity: string;         // "moderate"
    region: string;            // "suburban"
    typicalAge: string;        // "30-55"
    notes?: string;
  };
  contentRestrictions?: {      // Things the brand does NOT offer/want
    excludeTopics?: string[];  // ["botox", "surgery"]
    excludeCompetitors?: string[];
    requiredDisclosures?: string[];
  };
}

Layer 4: Quality Intelligence (Agent Behavior — To Be Enhanced)

These are the proven patterns from the Mac Mini that must be incorporated as agent-level capabilities, not just prompt text:

4a. Duplicate Prevention

Problem: Without dedup, the agent writes "5 Benefits of Microneedling" every few runs.

Mac Mini approach: Fetch last 100 titles + slugs, contextual matching (not just exact), reject topic-level duplicates.

EC2 approach (refined):

Before writing, query MongoDB for the site's existing article titles
Exact/near-exact title match ONLY — "Why Microneedling Works" and "Why Microneedling Works!" = duplicate. But "Why Microneedling Works" and "Benefits of Microneedling for Your Skin" = ALLOWED (different angle, both valuable)
No contextual/semantic dedup — this backfires. Businesses WANT multiple articles covering the same topic from different angles. A med spa should have articles about microneedling benefits, preparation, aftercare, comparisons, etc.
Dedup window scales with plan: Free (15 articles) = check all. Pro (50) = check all. Scale (150) = last 100 cap. Agency = configurable.
Token cost: Titles only, ~500 tokens for 50 titles. Negligible.
Implementation: Title matching done in code (Scroll Worker / job-worker.js), NOT passed to Claude. Avoids Claude being overly conservative.

4b. Topic Research & Awareness

Problem: Generic articles are fine but timely, relevant articles drive more traffic.

Mac Mini approach: Web searches for breaking news in the niche before each run.

EC2 approach (tiered):

Content mix: 80% evergreen / 20% seasonal-timely. Evergreen is the backbone for local business SEO. "How to Choose the Right Roofing Material" has value for years. Trending = "relevant to their customers right now" (e.g., "Spring Roof Maintenance Checklist"), NOT news slop.
Free tier: No web research. Evergreen articles only (cheaper, still high quality).
Pro tier: Light seasonal awareness (time of year, common seasonal topics for niche).
Scale/Agency: Web research enabled for timely content alongside evergreen.
Business-specific (Pro+ with services field, see #65): Only write about services/products the brand actually offers. Free tier writes generically about the niche without claiming the brand offers specific services.

4c. Brand SEO Integration

Problem: Articles without strong brand presence don't build SEO authority.

Mac Mini approach: Brand name in title, first paragraph, 3-5x naturally, backlink CTA, author bio.

EC2 approach (carried forward — already in quality-rules.js):

Brand name in article title (when it fits naturally)
Brand mentioned in first paragraph as the local expert
Brand in SEO meta description
Brand + location combos 2-3x naturally throughout
CTA section at article end with website/social links
NOT over-stuffed — natural and helpful

4d. Writing Quality Enforcement

Mac Mini approach: Extensive checklist, word count verification, style rules.

EC2 approach (carried forward — already in quality-rules.js):

Minimum 1200 words (target 1200-1800)
No em dashes, no "crucial"/"utilize"
Varied sentence length, conversational tone
Relatable scenarios, questions for flow
Subheadings, bullets, white space for readability
Original synthesis — not copied from sources

4e. Image-Topic Matching

Problem: Generic stock-photo-style images that don't match the article topic.

Mac Mini approach: Detailed DALL-E prompts describing the specific subject, never brand names (DALL-E blocks them).

EC2 approach (carried forward — already in dalle-image.js):

Prompts crafted to match specific article content
Describe distinctive visual features instead of brand names
Demographic matching when profile available
1792x1024 landscape, realistic stock photo style
No text, logos, or watermarks

4f. Post-Publish Actions

Mac Mini approach: IndexNow ping, published log, delivery announce.

EC2 approach (see #66):

Update article status in MongoDB (already done)
Email notification to site owner (already done via Resend)
Subdomain articles (*.tryscribe.co): Submit to tryscribe.co Google Search Console, Bing Webmaster, IndexNow
Custom domain articles: Separate workflow, deferred until #7 ships
⚠️ DEV MODE GATE: All search submissions gated behind NODE_ENV=production AND ENABLE_SEARCH_SUBMISSION=true. Both must be true. No test articles in search indices.
Analytics tracking (future)

4g. Quality Check (22-Point SEO Audit)

The 22-point quality check is the product's quality standard. This is what differentiates Scribe from AI slop generators. Every article MUST pass this checklist before publishing.

Source of truth: platform/docs/SEO-QUALITY-CHECKLIST.md (replicated from tryscribe.co/seo-guidelines.html — update both when changing).

The full checklist must be incorporated into quality-rules.js as the authoritative quality gate.

4h. Services & Content Restrictions (Pro+ — #65)

Data model:

// In MongoDB site config
services?: string[];           // From curated niche list + custom approved
pendingServices?: string[];    // Custom entries awaiting admin review
excludeServices?: string[];    // Services to explicitly avoid

Prompt pattern: DB stores DATA (list of services). JS stores RULES (how to use that data).

Prompt: "Only write about services this brand offers: ${services}. Never claim they offer unlisted services."
If no services configured (free tier): "Write generally about the niche without making specific claims about what this brand offers."

Security:

Curated services list per niche category, plus "Other" free-text
Automated blocklist on submission (illegal/inappropriate terms)
Niche-mismatch soft flag: custom service doesn't match niche → flag for admin, don't block
Custom "Other" entries go to pendingServices — NOT in prompts until admin-approved
Attack surface: users can TYPE anything, but unapproved entries never affect article output

Prompt Assembly Flow

Job arrives from MongoDB
        │
        ▼
buildScribePrompt(job)
        │
        ├── Brand details (from job.params / site config)
        ├── Article IDs to update (from job.articleIds)
        ├── buildTagsBlock() — standard tag taxonomy
        ├── buildBrandSeoBlock() — brand SEO integration rules
        ├── Article structure template
        ├── buildQualityBlock() — quality rules, readability, engagement
        ├── buildDalleRulesBlock() — image generation rules
        └── Image upload API instructions
        │
        ▼
Complete prompt sent to:
  openclaw agent --agent main --message <prompt>
        │
        ▼
OpenClaw spawns article session with:
  - Agent identity (SOUL.md, AGENTS.md)
  - Assembled prompt (from buildScribePrompt)
  - Tools: MongoDB access, DALL-E API, web search, image upload API
        │
        ▼
Agent executes autonomously:
  1. Research trending topics for niche/location
  2. Check last 100 titles for duplicates
  3. Write articles with full quality gates
  4. Generate matched DALL-E images
  5. Upload images via CDN API
  6. Update article docs in MongoDB
  7. Report completion

File Layout on EC2

/home/ubuntu/
├── scribe/                          # Git repo (tryscribeco/scribe)
│   ├── platform/                    # Next.js app (deployed to Vercel)
│   │   └── docs/
│   │       └── ARCHITECTURE.md      # This document
│   └── worker/
│       ├── job-worker.js            # Scroll Worker — job poller + session spawner
│       └── prompts/                 # Prompt intelligence modules
│           ├── article-writing.js   # Main prompt builder
│           ├── quality-rules.js     # Quality, CTA, brand SEO
│           ├── dalle-image.js       # Image generation rules
│           └── tags.js              # Standard tag taxonomy
│
├── .openclaw/
│   ├── openclaw.json                # OpenClaw config (main agent = Scribe Walker)
│   ├── workspace/                   # Agent workspace
│   │   ├── SOUL.md                  # Scribe Walker identity
│   │   ├── AGENTS.md                # Security rules, operational boundaries
│   │   ├── IDENTITY.md              # Name, role, platform
│   │   ├── TOOLS.md                 # Environment details
│   │   └── HEARTBEAT.md             # No proactive tasks (headless)
│   └── agents/
│       └── main/
│           ├── agent/
│           │   └── auth-profiles.json  # Anthropic auth
│           └── sessions/               # Session history
│
├── /etc/scribe/.env                 # Secrets (root:root, 600)
└── /etc/systemd/system/
    ├── openclaw-gateway.service     # OpenClaw gateway (always running)
    └── scribe-worker.service        # Scroll Worker (always running)

Migration Status

Phase	Status	Details
AWS Account Foundation (#63)	✅ Complete	Org, IAM Identity Center, Production account, budget
EC2 Provisioning	✅ Complete	t3.small, Ubuntu 24.04, hardened, Elastic IP
Node.js + Repo	✅ Complete	Node 22, npm install, env file
OpenClaw Install + Onboard	✅ Complete	v2026.3.2, Opus 4.6, hooks enabled
OpenClaw Gateway Service	✅ Complete	System-level systemd, RPC probe OK
Agent Workspace (initial)	✅ Complete	SOUL.md, AGENTS.md deployed
Architecture Review	🔄 In Progress	This document — awaiting approval
Agent Workspace (enhanced)	⬜ Pending	Incorporate quality intelligence from architecture
Scroll Worker Service	✅ Complete	systemd unit for job-worker.js (scribe-worker.service)
Cutover & Testing	⬜ Pending	Test articles, 24h monitor, kill Mac worker
Post-Migration Hardening	⬜ Pending	Structured logging, graceful shutdown, alerting

Resolved Questions (Architecture Review — Mar 7, 2026)

#	Question	Decision
1	Topic research scope	Tiered: Free = none (evergreen only). Pro = seasonal awareness. Scale/Agency = web research.
2	Duplicate prevention window	Scales with plan: Free = all (15). Pro = all (50). Scale = last 100. Agency = configurable. Exact/near-exact match only.
3	Content restrictions storage	DB for data (services list), JS for rules (how to use that data). Custom entries require admin approval.
4	IndexNow	Submit for tryscribe.co subdomains. Custom domains deferred to #7. Dev mode gate required. See #66.
5	Prompt module updates	Keep in JS files. Deploy = git pull + restart (~10 sec). Security > hot-reload convenience.
6	Memory across jobs	Stateless. Each job independent. Mac Mini was stateless too (ephemeral sessions). Consistent.

Testing Strategy (Migration — #67)

Worker routing via job document field:

Default (no field): EC2 picks up the job
worker: "local": Mac Mini picks up the job (triggered via ?worker=local API param)
Both workers run simultaneously during migration validation
Compare article quality side by side
Remove routing code after EC2 validated and Mac worker decommissioned

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History