A minimal Node.js starter for running a Bright Data Scraper Studio collector via the Data Collection API: trigger a job with a list of URLs and download the results.
Open in CodeSandbox, sign in with GitHub, then fork the repository to begin making changes.
- Overview
- Features
- Prerequisites
- Installation
- Usage
- Configuration
- How it works
- Examples
- Output
- Security
- Support
- License
Bright Data Scraper Studio is a low-code IDE for building custom web scraping collectors on the Bright Data platform. Once a collector is published it exposes two HTTP endpoints:
| Step | Endpoint | Purpose |
|---|---|---|
| 1 | POST /dca/trigger?collector=<id> |
Queue one or more inputs for the collector |
| 2 | GET /dca/dataset?id=<snapshot_id> |
Download the collected data when ready |
This repository wraps those two calls in about 150 lines of Node.js so you can copy, paste and ship.
- Trigger a Scraper Studio collector via the
/dca/triggerendpoint - Poll
/dca/datasetuntil results are ready - Env-var config via
.env(no secrets in code) - Retry with exponential backoff for transient errors (5xx and network); fails fast on 4xx
- Library helpers:
triggerWithUrl,triggerWithUrls,runScraper - Saves the raw JSON response to a timestamped file
- ES modules, Node 18+
- Node.js v18 or higher
- A Bright Data account with an API token
- A published collector in Scraper Studio; copy its Collector ID (starts with
c_)
git clone https://github.com/brightdata/bright-data-scraper-studio-nodejs-project.git
cd bright-data-scraper-studio-nodejs-project
npm install
cp .env.example .env # then edit .env with your token and collector IDnpm start
# or
node index.jsResults are written to a scraper_studio_results_<timestamp>.json file in the project directory.
Two environment variables are required. Set them in .env, in your shell, or hardcode them in index.js:
| Variable | Where to find it |
|---|---|
BRIGHT_DATA_API_TOKEN |
Bright Data dashboard, Account Settings → API Tokens |
BRIGHT_DATA_COLLECTOR_ID |
Scraper Studio: open your collector, copy the ID from the URL (starts with c_) |
You can also tune the polling and retry behavior at the top of index.js:
const POLL_INTERVAL_MS = 5000; // delay between dataset checks
const MAX_POLL_ATTEMPTS = 60; // give up after ~5 minutes
const MAX_RETRIES = 3; // for transient HTTP failuresThe shape of SAMPLE_URLS must match the input schema you defined in Scraper Studio. The default sample assumes a single url field. If your collector uses different inputs (for example, keyword, zip_code, category), update the objects accordingly.
+-----------------+ POST /dca/trigger +-------------------+
| Your script | --------------------------> | Scraper Studio |
| (index.js) | <-- { collection_id } ----- | Collector |
+-----------------+ +-------------------+
| |
| GET /dca/dataset?id=<snapshot_id> |
| (poll every 5s, retry 5xx with backoff) |
| <--- [ { ...record... }, ... ] -------------- |
v
scraper_studio_results_<timestamp>.json
The script polls /dca/dataset every five seconds for up to five minutes. A non-empty JSON array is treated as a finished snapshot. Transient errors (5xx and network) are retried with exponential backoff (1s, 2s, 4s); 4xx errors fail immediately so you fix the request rather than retry it.
Replace SAMPLE_URLS in index.js:
const SAMPLE_URLS = [
{ url: 'https://example.com/product/1' },
{ url: 'https://example.com/product/2' }
];If your collector expects something other than url, pass whatever fields it defines:
const inputs = [
{ keyword: 'wireless headphones', country: 'US' },
{ keyword: 'standing desk', country: 'DE' }
];
await runScraper(inputs);runScraper, triggerWithUrl, triggerWithUrls and saveResults are all exported:
import { triggerWithUrls, saveResults } from './index.js';
const data = await triggerWithUrls([
'https://example.com/page-1',
'https://example.com/page-2'
]);
saveResults(data, 'my_run.json');- Results are saved as JSON files named
scraper_studio_results_<ISO timestamp>.json. - The file contains the raw collector output: one record per input URL by default.
Never commit your .env file. The shipped .gitignore blocks .env and .env.local.
If you accidentally commit a real BRIGHT_DATA_API_TOKEN:
- Rotate the token immediately at brightdata.com/cp/setting.
- Use
git filter-repoor BFG Repo-Cleaner to remove the secret from history. - Force-push and notify anyone who may have cloned the leak.
This project is licensed under the MIT License. See LICENSE for details.
