ClipPulse Architecture

Overview

ClipPulse is a Google Apps Script-based tool that collects social media data from Instagram and X (Twitter), then outputs structured data to Google Sheets. It uses natural language instructions parsed by an LLM to determine collection parameters and target platforms.

System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Google Apps Script                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
│  │   Web App    │  │ Orchestrator │  │   Platform Collectors    │  │
│  │   (UI.html)  │→ │  (Control)   │→ │ (Instagram/X/TikTok)     │  │
│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │
│         ↓                ↓                       ↓                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
│  │  LLM Planner │  │ State Store  │  │    Sheet Writer          │  │
│  │  (OpenAI)    │  │ (Properties) │  │    Drive Manager         │  │
│  └──────────────┘  └──────────────┘  └──────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
         ↓                                         ↓
┌─────────────────┐                    ┌───────────────────────────┐
│  OpenAI API     │                    │     Google Services       │
│  (GPT-4o)       │                    │  - Google Drive           │
└─────────────────┘                    │  - Google Sheets          │
                                       │  - Script Properties      │
┌─────────────────┐                    └───────────────────────────┘
│ External APIs   │
│  - Instagram    │
│    Graph API    │
│  - TwitterAPI.io│
│    (X/Twitter)  │
│  - TikTok API   │
│    (disabled)   │
└─────────────────┘

Core Components

1. Web UI (`UI.html`)

Single-page HTML interface served via Apps Script HTML Service
Handles user instruction input and execution triggering
Polls backend for run status updates
Displays progress for both Instagram and X platforms
Shows collapsible data field reference for each platform
Dark/Light mode toggle

2. Orchestrator (`Orchestrator.js`)

Controls the entire run lifecycle
Manages state transitions: CREATED → PLANNING → RUNNING_INSTAGRAM → RUNNING_X → FINALIZING → COMPLETED
Handles 6-minute execution limit via continuation triggers
Coordinates platform collectors in sequence
Supports mock mode for testing

3. LLM Planner (`LLMPlanner.js`)

Parses natural language instructions using OpenAI GPT-4o
Determines target platforms from instruction context
Generates structured collection plans with:
- Target platforms (Instagram, X, or both)
- Target counts per platform
- Keywords and hashtags
- Query strategies (Instagram hashtag search, X query syntax)
- User handles for targeted collection

4. Instagram Collector (`InstagramCollector.js`)

Implements hashtag search and owned-account retrieval strategies
Normalizes API responses to fixed 23-column schema
Handles pagination and deduplication
Creates Drive artifacts (video or watch.html)
Supports optional RapidAPI enrichment for hashtag search results

4a. Instagram RapidAPI (`InstagramRapidAPI.js`)

Optional third-party data enrichment for hashtag search results
Official Instagram Graph API hashtag search returns limited fields (no media_url, username, etc.)
RapidAPI /media?id=... endpoint provides additional fields
Uses numeric Instagram media IDs (not shortcodes)
Gracefully handles "media not found" responses (returns null, doesn't fail)

Key Functions:

Function	Description
`getPostDetailsByMediaId(mediaId)`	Fetch post details using numeric media ID
`enrichPostsWithRapidAPI(posts, hashtag)`	Enrich official API results with RapidAPI data
`downloadVideoFromRapidAPI(videoUrl, filename)`	Download video from RapidAPI-provided URL
`normalizeRapidAPIPost(post, hashtag)`	Normalize API response to standard schema

downloadVideoFromRapidAPI:

Input: videoUrl (string), filename (string)
Output: { success: boolean, blob?: Blob, filename?: string, contentType?: string, size?: number, error?: string }

Process:
1. Validate inputs (URL format, filename)
2. Fetch video via UrlFetchApp.fetch() with redirect following
3. Check HTTP status (expect 200)
4. Validate size ≤ 50MB (Apps Script limit)
5. Sanitize filename, set blob name
6. Return blob for Drive storage

5. X Collector (`XCollector.js`)

Uses TwitterAPI.io Advanced Search API
Builds search queries from plan keywords, hashtags, and user handles
Supports query modifiers: from:user, #hashtag, lang:, date ranges
Two query types: "Latest" (recent) and "Top" (popular)
Normalizes API responses to fixed 28-column schema
Handles cursor-based pagination
Creates Drive artifacts (watch.html with tweet link)

6. State Store (`StateStore.js`)

Persists run state to Script Properties
Tracks progress, cursors, and processed IDs for each platform
Supports xProgress alongside instagramProgress
Enables resume after timeout/continuation

7. Sheet Writer (`SheetWriter.js`)

Creates spreadsheets with platform-specific schemas:
- Instagram: 23 columns
- X: 28 columns
Batch writes rows for efficiency
Handles data normalization per platform

8. Drive Manager (`DriveManager.js`)

Creates folder structure per run with platform subfolders
Saves artifacts: raw.json, video.mp4, watch.html
Generates shareable Drive URLs
Supports Instagram, X, and TikTok folder creation

9. Auth (`Auth.js`)

Manages Meta/Instagram OAuth2 flow
X (Twitter) uses simple API key auth (no OAuth needed)
Handles TikTok API authentication (disabled)
Token refresh and caching

10. Mocks (`Mocks.js`)

Provides mock data generators for testing
Supports mock mode for Instagram and X collection
Useful for development without API calls

Data Flow

1. User Input
   └─→ Natural language instruction (e.g., "Find 50 posts about skincare from Instagram and X")

2. Planning Phase
   └─→ LLM parses instruction → Structured plan object
   └─→ Determines platforms: Instagram, X, or both

3. Collection Phase
   Instagram:
   └─→ Instagram Graph API calls → Normalized post data (23 columns)
   └─→ Create Drive artifacts → Get Drive URLs
   └─→ Batch write to Instagram sheet

   X (Twitter):
   └─→ TwitterAPI.io Advanced Search → Normalized tweet data (28 columns)
   └─→ Create Drive artifacts → Get Drive URLs
   └─→ Batch write to X sheet

4. Output
   └─→ Spreadsheet with Instagram tab (23 columns) and X tab (28 columns)
   └─→ Drive folder with raw.json + video/watch artifacts per post

Run Lifecycle

State	Description
`CREATED`	Initial state, run ID generated
`PLANNING`	LLM parsing instruction, creating resources
`RUNNING_INSTAGRAM`	Collecting Instagram data
`RUNNING_X`	Collecting X (Twitter) data
`RUNNING_TIKTOK`	(Disabled) Collecting TikTok data
`FINALIZING`	Optimizing spreadsheet, saving manifest
`COMPLETED`	Run finished successfully
`FAILED`	Run encountered unrecoverable error

Storage Structure

Google Drive:
ClipPulse/
├── runs/
│   └── YYYY/MM/
│       └── YYYYMMDD_HHMMSS_<hash>/
│           ├── spreadsheet/
│           ├── instagram/
│           │   └── <post_id>/
│           │       ├── raw.json
│           │       ├── video.mp4 (if downloadable)
│           │       └── watch.html (fallback)
│           ├── x/
│           │   └── <tweet_id>/
│           │       ├── raw.json
│           │       └── watch.html
│           └── tiktok/ (disabled)
└── manifests/

Google Sheets:
ClipPulse_<runId>
├── Instagram (23 columns)
└── X (28 columns)

Spreadsheet Column Schemas

Instagram Tab (23 columns)

platform_post_id
create_username
posted_at
caption_or_description
post_url
like_count
comments_count
media_type
media_url
thumbnail_url
shortcode
media_product_type
is_comment_enabled
is_shared_to_feed
children
edges_comments
edges_insights
edges_collaborators
boost_ads_list
boost_eligibility_info
copyright_check_information_status
ref_url
memo

X Tab (28 columns)

platform_post_id
create_username
posted_at
text
post_url
source
retweet_count
reply_count
like_count
quote_count
view_count
lang
is_reply
in_reply_to_id
conversation_id
author_id
author_name
author_display_name
author_followers
author_following
author_is_blue_verified
author_created_at
hashtags
urls
user_mentions
media
ref_url
memo

Key Design Decisions

6-Minute Timeout Handling

Detects timeout at 5 minutes
Saves state and schedules continuation trigger
Resumes from saved cursor position

Platform Collection Order

Instagram collected first (if targeted)
X collected second (if targeted)
TikTok collected last (currently disabled)

Video/Post Artifact Strategy

Instagram: Prefer actual video download when feasible, fallback to watch.html. ref_url points to Drive file.
X: Create watch.html for archival, but ref_url contains direct tweet URL for easy access.

Instagram Video Download Flow

┌───────────────────────────────────────────────────────────────┐
│                    createInstagramArtifact()                   │
└───────────────────────────┬───────────────────────────────────┘
                            │
                ┌───────────▼───────────┐
                │  Is VIDEO content?    │
                └───────────┬───────────┘
                            │ Yes
            ┌───────────────▼───────────────┐
            │ RapidAPI configured &         │
            │ video_url available?          │
            └───────────────┬───────────────┘
                   Yes │           │ No
                       ▼           │
        ┌──────────────────────┐   │
        │ downloadVideoFrom    │   │
        │ RapidAPI(url, folder)│   │
        └──────────┬───────────┘   │
                   │               │
        ┌──────────▼───────────┐   │
        │ Success?             │   │
        │ → video.mp4 saved    │   │
        │ → ref_url = Drive URL│   │
        └──────────┬───────────┘   │
                   │ Failure       │
                   ▼               ▼
        ┌──────────────────────────────────┐
        │ saveVideoFile(folder, media_url) │
        │ (Standard video download)        │
        └──────────────┬───────────────────┘
                       │
        ┌──────────────▼───────────────────┐
        │ Success?                         │
        │ → ref_url = video.mp4 Drive URL  │
        │ Failure?                         │
        │ → Create watch.html              │
        │ → ref_url = watch.html Drive URL │
        └──────────────────────────────────┘

Batch Processing

Process 10-20 posts per batch
Single batch write to Sheets (no per-cell loops)

Deduplication

Track processed IDs in run state per platform
Skip duplicates from pagination

Platform Detection from Instructions

Keywords like "Twitter", "X", "tweets", "@username" → include X
Keywords like "Instagram", "IG", "posts" → include Instagram
No specific platform mention → collect from BOTH platforms

Configuration

Required Script Properties:

OPENAI_API_KEY - OpenAI API key
META_APP_ID / META_APP_SECRET - Meta app credentials (for Instagram)
X_API_KEY - TwitterAPI.io API key (for X)
CLIPPULSE_ROOT_FOLDER_ID - Drive root folder (auto-created)

Optional:

OPENAI_MODEL - Default: gpt-4o
MAX_POSTS_PER_PLATFORM_DEFAULT - Default: 30
BATCH_SIZE - Default: 15
USE_MOCKS - Enable mock mode for testing
INSTAGRAM_RAPIDAPI_KEY - RapidAPI key for Instagram data enrichment
INSTAGRAM_RAPIDAPI_HOST - RapidAPI host (e.g., instagram-api-fast-reliable-data-scraper.p.rapidapi.com)

External Dependencies

OAuth2 Library: googleworkspace/apps-script-oauth2 (v43)
OpenAI API: Chat Completions endpoint with JSON mode
Instagram Graph API: v18.0+ (requires professional account)
Instagram RapidAPI (optional): "Instagram API – Fast & Reliable Data Scraper" for data enrichment
TwitterAPI.io: Advanced Search API (API key authentication)

X API Query Syntax

The X collector supports rich query syntax:

Syntax	Example	Description
Keywords	`"AI" OR "machine learning"`	Search for keywords
Hashtags	`#tech #AI`	Search by hashtag
From user	`from:elonmusk`	Tweets from specific user
Date range	`since:2024-01-01_00:00:00_UTC`	Tweets after date
Language	`lang:en`	Filter by language
Exclude RT	`-is:retweet`	Exclude retweets

Query type options:

Latest - Most recent tweets matching query
Top - Most popular/engaging tweets matching query

API Mode (n8n Integration)

ClipPulse supports both UI and API modes:

Request Routing

GET/POST /exec
    │
    ├─ action=start → API: Start new run
    ├─ action=status → API: Get run status
    └─ (no action) → UI: Return HTML page

API Handler (`ApiHandler.js`)

New component that handles HTTP API requests:

validateApiSecret() - Secret-based authentication
handleApiStart() - Start a collection run
handleApiStatus() - Get run status
routeApiRequest() - Route based on action parameter

API Request Flow

n8n Workflow
    │
    ▼
POST /exec?action=start&secret=xxx
    │
    ├─ Validate secret
    ├─ Parse JSON body (instruction, external_run_id, target_folder_id)
    ├─ Call startRun() [shared business logic]
    └─ Return JSON response

Folder Structure (API Mode)

When target_folder_id is provided (API mode):

{target_folder_id}/          (e.g., n8n run folder)
└── clippulse_{run_id}/
    ├── spreadsheet/
    │   └── ClipPulse_{run_id}.gsheet
    ├── instagram/
    ├── x/
    └── tiktok/

Configuration

Required for API mode:

CLIPPULSE_API_SECRET - Shared secret for authentication

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

ClipPulse Architecture

Overview

System Architecture

Core Components

1. Web UI (UI.html)

2. Orchestrator (Orchestrator.js)

3. LLM Planner (LLMPlanner.js)

4. Instagram Collector (InstagramCollector.js)

4a. Instagram RapidAPI (InstagramRapidAPI.js)

5. X Collector (XCollector.js)

6. State Store (StateStore.js)

7. Sheet Writer (SheetWriter.js)

8. Drive Manager (DriveManager.js)

9. Auth (Auth.js)

10. Mocks (Mocks.js)

Data Flow

Run Lifecycle

Storage Structure

Spreadsheet Column Schemas

Instagram Tab (23 columns)

X Tab (28 columns)

Key Design Decisions

6-Minute Timeout Handling

Platform Collection Order

Video/Post Artifact Strategy

Instagram Video Download Flow

Batch Processing

Deduplication

Platform Detection from Instructions

Configuration

External Dependencies

X API Query Syntax

API Mode (n8n Integration)

Request Routing

API Handler (ApiHandler.js)

API Request Flow

Folder Structure (API Mode)

Configuration

1. Web UI (`UI.html`)

2. Orchestrator (`Orchestrator.js`)

3. LLM Planner (`LLMPlanner.js`)

4. Instagram Collector (`InstagramCollector.js`)

4a. Instagram RapidAPI (`InstagramRapidAPI.js`)

5. X Collector (`XCollector.js`)

6. State Store (`StateStore.js`)

7. Sheet Writer (`SheetWriter.js`)

8. Drive Manager (`DriveManager.js`)

9. Auth (`Auth.js`)

10. Mocks (`Mocks.js`)

API Handler (`ApiHandler.js`)