Bright Data Scraper Studio (Node.js)

A minimal Node.js starter for running a Bright Data Scraper Studio collector via the Data Collection API: trigger a job with a list of URLs and download the results.

Open in CodeSandbox, sign in with GitHub, then fork the repository to begin making changes.

Overview

Bright Data Scraper Studio is a low-code IDE for building custom web scraping collectors on the Bright Data platform. Once a collector is published it exposes two HTTP endpoints:

Step	Endpoint	Purpose
1	`POST /dca/trigger?collector=<id>`	Queue one or more inputs for the collector
2	`GET /dca/dataset?id=<snapshot_id>`	Download the collected data when ready

This repository wraps those two calls in about 150 lines of Node.js so you can copy, paste and ship.

Features

Trigger a Scraper Studio collector via the /dca/trigger endpoint
Poll /dca/dataset until results are ready
Env-var config via .env (no secrets in code)
Retry with exponential backoff for transient errors (5xx and network); fails fast on 4xx
Library helpers: triggerWithUrl, triggerWithUrls, runScraper
Saves the raw JSON response to a timestamped file
ES modules, Node 18+

Prerequisites

Node.js v18 or higher
A Bright Data account with an API token
A published collector in Scraper Studio; copy its Collector ID (starts with c_)

Installation

git clone https://github.com/brightdata/bright-data-scraper-studio-nodejs-project.git
cd bright-data-scraper-studio-nodejs-project
npm install
cp .env.example .env       # then edit .env with your token and collector ID

Usage

npm start
# or
node index.js

Results are written to a scraper_studio_results_<timestamp>.json file in the project directory.

Configuration

Two environment variables are required. Set them in .env, in your shell, or hardcode them in index.js:

Variable	Where to find it
`BRIGHT_DATA_API_TOKEN`	Bright Data dashboard, Account Settings → API Tokens
`BRIGHT_DATA_COLLECTOR_ID`	Scraper Studio: open your collector, copy the ID from the URL (starts with `c_`)

You can also tune the polling and retry behavior at the top of index.js:

const POLL_INTERVAL_MS  = 5000;  // delay between dataset checks
const MAX_POLL_ATTEMPTS = 60;    // give up after ~5 minutes
const MAX_RETRIES       = 3;     // for transient HTTP failures

The shape of SAMPLE_URLS must match the input schema you defined in Scraper Studio. The default sample assumes a single url field. If your collector uses different inputs (for example, keyword, zip_code, category), update the objects accordingly.

How it works

       +-----------------+      POST /dca/trigger      +-------------------+
       |  Your script    | --------------------------> |  Scraper Studio   |
       |  (index.js)     | <-- { collection_id } ----- |  Collector        |
       +-----------------+                             +-------------------+
                |                                                |
                |  GET /dca/dataset?id=<snapshot_id>             |
                |  (poll every 5s, retry 5xx with backoff)       |
                |  <--- [ { ...record... }, ... ] -------------- |
                v
       scraper_studio_results_<timestamp>.json

The script polls /dca/dataset every five seconds for up to five minutes. A non-empty JSON array is treated as a finished snapshot. Transient errors (5xx and network) are retried with exponential backoff (1s, 2s, 4s); 4xx errors fail immediately so you fix the request rather than retry it.

Examples

Run with your own URLs

Replace SAMPLE_URLS in index.js:

const SAMPLE_URLS = [
    { url: 'https://example.com/product/1' },
    { url: 'https://example.com/product/2' }
];

Custom input schema

If your collector expects something other than url, pass whatever fields it defines:

const inputs = [
    { keyword: 'wireless headphones', country: 'US' },
    { keyword: 'standing desk',       country: 'DE' }
];
await runScraper(inputs);

Use as a library

runScraper, triggerWithUrl, triggerWithUrls and saveResults are all exported:

import { triggerWithUrls, saveResults } from './index.js';

const data = await triggerWithUrls([
    'https://example.com/page-1',
    'https://example.com/page-2'
]);
saveResults(data, 'my_run.json');

Output

Results are saved as JSON files named scraper_studio_results_<ISO timestamp>.json.
The file contains the raw collector output: one record per input URL by default.

Security

Never commit your .env file. The shipped .gitignore blocks .env and .env.local.

If you accidentally commit a real BRIGHT_DATA_API_TOKEN:

Rotate the token immediately at brightdata.com/cp/setting.
Use git filter-repo or BFG Repo-Cleaner to remove the secret from history.
Force-push and notify anyone who may have cloned the leak.

Support

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.codesandbox		.codesandbox
.devcontainer		.devcontainer
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bright Data Scraper Studio (Node.js)

Table of contents

Overview

Features

Prerequisites

Installation

Usage

Configuration

How it works

Examples

Run with your own URLs

Custom input schema

Use as a library

Output

Security

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bright Data Scraper Studio (Node.js)

Table of contents

Overview

Features

Prerequisites

Installation

Usage

Configuration

How it works

Examples

Run with your own URLs

Custom input schema

Use as a library

Output

Security

Support

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages