A minimal Python starter for running a Bright Data Scraper Studio collector via the Data Collection API: trigger a job with a list of URLs and download the results.
Open in CodeSandbox, sign in with GitHub, then fork the repository to begin making changes.
- Overview
- Features
- Prerequisites
- Installation
- Usage
- Configuration
- How it works
- Examples
- Output
- Security
- Support
- License
Bright Data Scraper Studio is a low-code IDE for building custom web scraping collectors on the Bright Data platform. Once a collector is published it exposes two HTTP endpoints:
| Step | Endpoint | Purpose |
|---|---|---|
| 1 | POST /dca/trigger?collector=<id> |
Queue one or more inputs for the collector |
| 2 | GET /dca/dataset?id=<snapshot_id> |
Download the collected data when ready |
This repository wraps those two calls in about 150 lines of Python so you can copy, paste and ship.
- Trigger a Scraper Studio collector via the
/dca/triggerendpoint - Poll
/dca/datasetuntil results are ready - Env-var config via
.env(no secrets in code) - Retry with exponential backoff for transient errors (5xx and network); fails fast on 4xx
- Library helpers:
trigger_with_url,trigger_with_urls,run_scraper - Saves the raw JSON response to a timestamped file
- Python 3.8 or higher
- A Bright Data account with an API token
- A published collector in Scraper Studio; copy its Collector ID (starts with
c_)
git clone https://github.com/brightdata/bright-data-scraper-studio-python-project.git
cd bright-data-scraper-studio-python-project
pip install -r requirements.txt
cp .env.example .env # then edit .env with your token and collector IDrequests: HTTP client for the Bright Data APIcolorama: colored terminal outputpython-dotenv: load.envfiles intoos.environ
python index.pyResults are written to a scraper_studio_results_<timestamp>.json file in the project directory.
Two environment variables are required. Set them in .env, in your shell, or hardcode them in index.py:
| Variable | Where to find it |
|---|---|
BRIGHT_DATA_API_TOKEN |
Bright Data dashboard, Account Settings → API Tokens |
BRIGHT_DATA_COLLECTOR_ID |
Scraper Studio: open your collector, copy the ID from the URL (starts with c_) |
You can also tune the polling and retry behavior at the top of index.py:
POLL_INTERVAL_S = 5 # delay between dataset checks (seconds)
MAX_POLL_ATTEMPTS = 60 # give up after ~5 minutes
MAX_RETRIES = 3 # for transient HTTP failuresThe shape of SAMPLE_URLS must match the input schema you defined in Scraper Studio. The default sample assumes a single url field. If your collector uses different inputs (for example, keyword, zip_code, category), update the dictionaries accordingly.
+-----------------+ POST /dca/trigger +-------------------+
| Your script | --------------------------> | Scraper Studio |
| (index.py) | <-- { collection_id } ----- | Collector |
+-----------------+ +-------------------+
| |
| GET /dca/dataset?id=<snapshot_id> |
| (poll every 5s, retry 5xx with backoff) |
| <--- [ { ...record... }, ... ] -------------- |
v
scraper_studio_results_<timestamp>.json
The script polls /dca/dataset every five seconds for up to five minutes. A non-empty JSON array is treated as a finished snapshot. Transient errors (5xx and network) are retried with exponential backoff (1s, 2s, 4s); 4xx errors fail immediately so you fix the request rather than retry it.
Replace SAMPLE_URLS in index.py:
SAMPLE_URLS = [
{"url": "https://example.com/product/1"},
{"url": "https://example.com/product/2"},
]If your collector expects something other than url, pass whatever fields it defines:
inputs = [
{"keyword": "wireless headphones", "country": "US"},
{"keyword": "standing desk", "country": "DE"},
]
run_scraper(inputs)run_scraper, trigger_with_url, trigger_with_urls and save_results are top-level functions:
from index import trigger_with_urls, save_results
data = trigger_with_urls([
"https://example.com/page-1",
"https://example.com/page-2",
])
save_results(data, "my_run.json")- Results are saved as JSON files named
scraper_studio_results_<ISO timestamp>.json. - The file contains the raw collector output: one record per input URL by default.
Bright Data Scraper Studio
==============================
Starting Scraper Studio collector...
Queueing 3 input(s)
Job queued. Snapshot ID: j_abc123
Polling for results...
Attempt 1/60 - building
Attempt 2/60 - building
Attempt 3/60 - building
Results downloaded.
Saved to scraper_studio_results_2026-05-22T10-30-45-123456.json
Done.
Never commit your .env file. The shipped .gitignore blocks .env and .env.local.
If you accidentally commit a real BRIGHT_DATA_API_TOKEN:
- Rotate the token immediately at brightdata.com/cp/setting.
- Use
git filter-repoor BFG Repo-Cleaner to remove the secret from history. - Force-push and notify anyone who may have cloned the leak.
This project is licensed under the MIT License. See LICENSE for details.
