Bright Data Scraper Studio (Python)

A minimal Python starter for running a Bright Data Scraper Studio collector via the Data Collection API: trigger a job with a list of URLs and download the results.

Open in CodeSandbox, sign in with GitHub, then fork the repository to begin making changes.

Overview

Bright Data Scraper Studio is a low-code IDE for building custom web scraping collectors on the Bright Data platform. Once a collector is published it exposes two HTTP endpoints:

Step	Endpoint	Purpose
1	`POST /dca/trigger?collector=<id>`	Queue one or more inputs for the collector
2	`GET /dca/dataset?id=<snapshot_id>`	Download the collected data when ready

This repository wraps those two calls in about 150 lines of Python so you can copy, paste and ship.

Features

Trigger a Scraper Studio collector via the /dca/trigger endpoint
Poll /dca/dataset until results are ready
Env-var config via .env (no secrets in code)
Retry with exponential backoff for transient errors (5xx and network); fails fast on 4xx
Library helpers: trigger_with_url, trigger_with_urls, run_scraper
Saves the raw JSON response to a timestamped file

Prerequisites

Python 3.8 or higher
A Bright Data account with an API token
A published collector in Scraper Studio; copy its Collector ID (starts with c_)

Installation

git clone https://github.com/brightdata/bright-data-scraper-studio-python-project.git
cd bright-data-scraper-studio-python-project
pip install -r requirements.txt
cp .env.example .env       # then edit .env with your token and collector ID

Dependencies

requests: HTTP client for the Bright Data API
colorama: colored terminal output
python-dotenv: load .env files into os.environ

Usage

python index.py

Results are written to a scraper_studio_results_<timestamp>.json file in the project directory.

Configuration

Two environment variables are required. Set them in .env, in your shell, or hardcode them in index.py:

Variable	Where to find it
`BRIGHT_DATA_API_TOKEN`	Bright Data dashboard, Account Settings → API Tokens
`BRIGHT_DATA_COLLECTOR_ID`	Scraper Studio: open your collector, copy the ID from the URL (starts with `c_`)

You can also tune the polling and retry behavior at the top of index.py:

POLL_INTERVAL_S   = 5    # delay between dataset checks (seconds)
MAX_POLL_ATTEMPTS = 60   # give up after ~5 minutes
MAX_RETRIES       = 3    # for transient HTTP failures

The shape of SAMPLE_URLS must match the input schema you defined in Scraper Studio. The default sample assumes a single url field. If your collector uses different inputs (for example, keyword, zip_code, category), update the dictionaries accordingly.

How it works

       +-----------------+      POST /dca/trigger      +-------------------+
       |  Your script    | --------------------------> |  Scraper Studio   |
       |  (index.py)     | <-- { collection_id } ----- |  Collector        |
       +-----------------+                             +-------------------+
                |                                                |
                |  GET /dca/dataset?id=<snapshot_id>             |
                |  (poll every 5s, retry 5xx with backoff)       |
                |  <--- [ { ...record... }, ... ] -------------- |
                v
       scraper_studio_results_<timestamp>.json

The script polls /dca/dataset every five seconds for up to five minutes. A non-empty JSON array is treated as a finished snapshot. Transient errors (5xx and network) are retried with exponential backoff (1s, 2s, 4s); 4xx errors fail immediately so you fix the request rather than retry it.

Examples

Run with your own URLs

Replace SAMPLE_URLS in index.py:

SAMPLE_URLS = [
    {"url": "https://example.com/product/1"},
    {"url": "https://example.com/product/2"},
]

Custom input schema

If your collector expects something other than url, pass whatever fields it defines:

inputs = [
    {"keyword": "wireless headphones", "country": "US"},
    {"keyword": "standing desk",       "country": "DE"},
]
run_scraper(inputs)

Use as a library

run_scraper, trigger_with_url, trigger_with_urls and save_results are top-level functions:

from index import trigger_with_urls, save_results

data = trigger_with_urls([
    "https://example.com/page-1",
    "https://example.com/page-2",
])
save_results(data, "my_run.json")

Output

Results are saved as JSON files named scraper_studio_results_<ISO timestamp>.json.
The file contains the raw collector output: one record per input URL by default.

Sample console output

Bright Data Scraper Studio
==============================
Starting Scraper Studio collector...
Queueing 3 input(s)
Job queued. Snapshot ID: j_abc123
Polling for results...
Attempt 1/60 - building
Attempt 2/60 - building
Attempt 3/60 - building
Results downloaded.
Saved to scraper_studio_results_2026-05-22T10-30-45-123456.json

Done.

Security

Never commit your .env file. The shipped .gitignore blocks .env and .env.local.

If you accidentally commit a real BRIGHT_DATA_API_TOKEN:

Rotate the token immediately at brightdata.com/cp/setting.
Use git filter-repo or BFG Repo-Cleaner to remove the secret from history.
Force-push and notify anyone who may have cloned the leak.

Support

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.codesandbox		.codesandbox
.devcontainer		.devcontainer
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.py		index.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bright Data Scraper Studio (Python)

Table of contents

Overview

Features

Prerequisites

Installation

Dependencies

Usage

Configuration

How it works

Examples

Run with your own URLs

Custom input schema

Use as a library

Output

Sample console output

Security

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bright Data Scraper Studio (Python)

Table of contents

Overview

Features

Prerequisites

Installation

Dependencies

Usage

Configuration

How it works

Examples

Run with your own URLs

Custom input schema

Use as a library

Output

Sample console output

Security

Support

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages