ClipPulse is a Google Apps Script-based tool that collects social media data from Instagram and X (Twitter), then outputs structured data to Google Sheets. It uses natural language instructions parsed by an LLM to determine collection parameters and target platforms.
┌─────────────────────────────────────────────────────────────────────┐
│ Google Apps Script │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Web App │ │ Orchestrator │ │ Platform Collectors │ │
│ │ (UI.html) │→ │ (Control) │→ │ (Instagram/X/TikTok) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────────┘ │
│ ↓ ↓ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ LLM Planner │ │ State Store │ │ Sheet Writer │ │
│ │ (OpenAI) │ │ (Properties) │ │ Drive Manager │ │
│ └──────────────┘ └──────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↓ ↓
┌─────────────────┐ ┌───────────────────────────┐
│ OpenAI API │ │ Google Services │
│ (GPT-4o) │ │ - Google Drive │
└─────────────────┘ │ - Google Sheets │
│ - Script Properties │
┌─────────────────┐ └───────────────────────────┘
│ External APIs │
│ - Instagram │
│ Graph API │
│ - TwitterAPI.io│
│ (X/Twitter) │
│ - TikTok API │
│ (disabled) │
└─────────────────┘
- Single-page HTML interface served via Apps Script HTML Service
- Handles user instruction input and execution triggering
- Polls backend for run status updates
- Displays progress for both Instagram and X platforms
- Shows collapsible data field reference for each platform
- Dark/Light mode toggle
- Controls the entire run lifecycle
- Manages state transitions:
CREATED→PLANNING→RUNNING_INSTAGRAM→RUNNING_X→FINALIZING→COMPLETED - Handles 6-minute execution limit via continuation triggers
- Coordinates platform collectors in sequence
- Supports mock mode for testing
- Parses natural language instructions using OpenAI GPT-4o
- Determines target platforms from instruction context
- Generates structured collection plans with:
- Target platforms (Instagram, X, or both)
- Target counts per platform
- Keywords and hashtags
- Query strategies (Instagram hashtag search, X query syntax)
- User handles for targeted collection
- Implements hashtag search and owned-account retrieval strategies
- Normalizes API responses to fixed 23-column schema
- Handles pagination and deduplication
- Creates Drive artifacts (video or watch.html)
- Supports optional RapidAPI enrichment for hashtag search results
- Optional third-party data enrichment for hashtag search results
- Official Instagram Graph API hashtag search returns limited fields (no
media_url,username, etc.) - RapidAPI
/media?id=...endpoint provides additional fields - Uses numeric Instagram media IDs (not shortcodes)
- Gracefully handles "media not found" responses (returns null, doesn't fail)
Key Functions:
| Function | Description |
|---|---|
getPostDetailsByMediaId(mediaId) |
Fetch post details using numeric media ID |
enrichPostsWithRapidAPI(posts, hashtag) |
Enrich official API results with RapidAPI data |
downloadVideoFromRapidAPI(videoUrl, filename) |
Download video from RapidAPI-provided URL |
normalizeRapidAPIPost(post, hashtag) |
Normalize API response to standard schema |
downloadVideoFromRapidAPI:
Input: videoUrl (string), filename (string)
Output: { success: boolean, blob?: Blob, filename?: string, contentType?: string, size?: number, error?: string }
Process:
1. Validate inputs (URL format, filename)
2. Fetch video via UrlFetchApp.fetch() with redirect following
3. Check HTTP status (expect 200)
4. Validate size ≤ 50MB (Apps Script limit)
5. Sanitize filename, set blob name
6. Return blob for Drive storage
- Uses TwitterAPI.io Advanced Search API
- Builds search queries from plan keywords, hashtags, and user handles
- Supports query modifiers:
from:user,#hashtag,lang:, date ranges - Two query types: "Latest" (recent) and "Top" (popular)
- Normalizes API responses to fixed 28-column schema
- Handles cursor-based pagination
- Creates Drive artifacts (watch.html with tweet link)
- Persists run state to Script Properties
- Tracks progress, cursors, and processed IDs for each platform
- Supports
xProgressalongsideinstagramProgress - Enables resume after timeout/continuation
- Creates spreadsheets with platform-specific schemas:
- Instagram: 23 columns
- X: 28 columns
- Batch writes rows for efficiency
- Handles data normalization per platform
- Creates folder structure per run with platform subfolders
- Saves artifacts:
raw.json,video.mp4,watch.html - Generates shareable Drive URLs
- Supports Instagram, X, and TikTok folder creation
- Manages Meta/Instagram OAuth2 flow
- X (Twitter) uses simple API key auth (no OAuth needed)
- Handles TikTok API authentication (disabled)
- Token refresh and caching
- Provides mock data generators for testing
- Supports mock mode for Instagram and X collection
- Useful for development without API calls
1. User Input
└─→ Natural language instruction (e.g., "Find 50 posts about skincare from Instagram and X")
2. Planning Phase
└─→ LLM parses instruction → Structured plan object
└─→ Determines platforms: Instagram, X, or both
3. Collection Phase
Instagram:
└─→ Instagram Graph API calls → Normalized post data (23 columns)
└─→ Create Drive artifacts → Get Drive URLs
└─→ Batch write to Instagram sheet
X (Twitter):
└─→ TwitterAPI.io Advanced Search → Normalized tweet data (28 columns)
└─→ Create Drive artifacts → Get Drive URLs
└─→ Batch write to X sheet
4. Output
└─→ Spreadsheet with Instagram tab (23 columns) and X tab (28 columns)
└─→ Drive folder with raw.json + video/watch artifacts per post
| State | Description |
|---|---|
CREATED |
Initial state, run ID generated |
PLANNING |
LLM parsing instruction, creating resources |
RUNNING_INSTAGRAM |
Collecting Instagram data |
RUNNING_X |
Collecting X (Twitter) data |
RUNNING_TIKTOK |
(Disabled) Collecting TikTok data |
FINALIZING |
Optimizing spreadsheet, saving manifest |
COMPLETED |
Run finished successfully |
FAILED |
Run encountered unrecoverable error |
Google Drive:
ClipPulse/
├── runs/
│ └── YYYY/MM/
│ └── YYYYMMDD_HHMMSS_<hash>/
│ ├── spreadsheet/
│ ├── instagram/
│ │ └── <post_id>/
│ │ ├── raw.json
│ │ ├── video.mp4 (if downloadable)
│ │ └── watch.html (fallback)
│ ├── x/
│ │ └── <tweet_id>/
│ │ ├── raw.json
│ │ └── watch.html
│ └── tiktok/ (disabled)
└── manifests/
Google Sheets:
ClipPulse_<runId>
├── Instagram (23 columns)
└── X (28 columns)
- platform_post_id
- create_username
- posted_at
- caption_or_description
- post_url
- like_count
- comments_count
- media_type
- media_url
- thumbnail_url
- shortcode
- media_product_type
- is_comment_enabled
- is_shared_to_feed
- children
- edges_comments
- edges_insights
- edges_collaborators
- boost_ads_list
- boost_eligibility_info
- copyright_check_information_status
- ref_url
- memo
- platform_post_id
- create_username
- posted_at
- text
- post_url
- source
- retweet_count
- reply_count
- like_count
- quote_count
- view_count
- lang
- is_reply
- in_reply_to_id
- conversation_id
- author_id
- author_name
- author_display_name
- author_followers
- author_following
- author_is_blue_verified
- author_created_at
- hashtags
- urls
- user_mentions
- media
- ref_url
- memo
- Detects timeout at 5 minutes
- Saves state and schedules continuation trigger
- Resumes from saved cursor position
- Instagram collected first (if targeted)
- X collected second (if targeted)
- TikTok collected last (currently disabled)
- Instagram: Prefer actual video download when feasible, fallback to
watch.html.ref_urlpoints to Drive file. - X: Create
watch.htmlfor archival, butref_urlcontains direct tweet URL for easy access.
┌───────────────────────────────────────────────────────────────┐
│ createInstagramArtifact() │
└───────────────────────────┬───────────────────────────────────┘
│
┌───────────▼───────────┐
│ Is VIDEO content? │
└───────────┬───────────┘
│ Yes
┌───────────────▼───────────────┐
│ RapidAPI configured & │
│ video_url available? │
└───────────────┬───────────────┘
Yes │ │ No
▼ │
┌──────────────────────┐ │
│ downloadVideoFrom │ │
│ RapidAPI(url, folder)│ │
└──────────┬───────────┘ │
│ │
┌──────────▼───────────┐ │
│ Success? │ │
│ → video.mp4 saved │ │
│ → ref_url = Drive URL│ │
└──────────┬───────────┘ │
│ Failure │
▼ ▼
┌──────────────────────────────────┐
│ saveVideoFile(folder, media_url) │
│ (Standard video download) │
└──────────────┬───────────────────┘
│
┌──────────────▼───────────────────┐
│ Success? │
│ → ref_url = video.mp4 Drive URL │
│ Failure? │
│ → Create watch.html │
│ → ref_url = watch.html Drive URL │
└──────────────────────────────────┘
- Process 10-20 posts per batch
- Single batch write to Sheets (no per-cell loops)
- Track processed IDs in run state per platform
- Skip duplicates from pagination
- Keywords like "Twitter", "X", "tweets", "@username" → include X
- Keywords like "Instagram", "IG", "posts" → include Instagram
- No specific platform mention → collect from BOTH platforms
Required Script Properties:
OPENAI_API_KEY- OpenAI API keyMETA_APP_ID/META_APP_SECRET- Meta app credentials (for Instagram)X_API_KEY- TwitterAPI.io API key (for X)CLIPPULSE_ROOT_FOLDER_ID- Drive root folder (auto-created)
Optional:
OPENAI_MODEL- Default:gpt-4oMAX_POSTS_PER_PLATFORM_DEFAULT- Default: 30BATCH_SIZE- Default: 15USE_MOCKS- Enable mock mode for testingINSTAGRAM_RAPIDAPI_KEY- RapidAPI key for Instagram data enrichmentINSTAGRAM_RAPIDAPI_HOST- RapidAPI host (e.g.,instagram-api-fast-reliable-data-scraper.p.rapidapi.com)
- OAuth2 Library:
googleworkspace/apps-script-oauth2(v43) - OpenAI API: Chat Completions endpoint with JSON mode
- Instagram Graph API: v18.0+ (requires professional account)
- Instagram RapidAPI (optional): "Instagram API – Fast & Reliable Data Scraper" for data enrichment
- TwitterAPI.io: Advanced Search API (API key authentication)
The X collector supports rich query syntax:
| Syntax | Example | Description |
|---|---|---|
| Keywords | "AI" OR "machine learning" |
Search for keywords |
| Hashtags | #tech #AI |
Search by hashtag |
| From user | from:elonmusk |
Tweets from specific user |
| Date range | since:2024-01-01_00:00:00_UTC |
Tweets after date |
| Language | lang:en |
Filter by language |
| Exclude RT | -is:retweet |
Exclude retweets |
Query type options:
Latest- Most recent tweets matching queryTop- Most popular/engaging tweets matching query
ClipPulse supports both UI and API modes:
GET/POST /exec
│
├─ action=start → API: Start new run
├─ action=status → API: Get run status
└─ (no action) → UI: Return HTML page
New component that handles HTTP API requests:
validateApiSecret()- Secret-based authenticationhandleApiStart()- Start a collection runhandleApiStatus()- Get run statusrouteApiRequest()- Route based on action parameter
n8n Workflow
│
▼
POST /exec?action=start&secret=xxx
│
├─ Validate secret
├─ Parse JSON body (instruction, external_run_id, target_folder_id)
├─ Call startRun() [shared business logic]
└─ Return JSON response
When target_folder_id is provided (API mode):
{target_folder_id}/ (e.g., n8n run folder)
└── clippulse_{run_id}/
├── spreadsheet/
│ └── ClipPulse_{run_id}.gsheet
├── instagram/
├── x/
└── tiktok/
Required for API mode:
CLIPPULSE_API_SECRET- Shared secret for authentication