Automated CAPTCHA Solving for Distributed Crawling. Integrate CapSolver with Crawlab to build enterprise-grade crawling systems that bypass reCAPTCHA, Cloudflare Turnstile, and more.
- Introduction
- Key Features
- Prerequisites
- Quick Start
- Integration Examples
- Best Practices
- Troubleshooting
- Bonus Code
- License
Managing web crawlers at scale requires robust infrastructure. Crawlab is a powerful distributed web crawler management platform, while CapSolver provides AI-powered CAPTCHA solving services. This repository provides ready-to-use templates to integrate these two powerhouses.
Crawlab is a language-agnostic distributed crawler management platform. It supports Python, Node.js, Go, and more, allowing you to manage spiders across multiple nodes with a beautiful UI.
CapSolver is an AI-driven service that solves various CAPTCHAs including reCAPTCHA (v2/v3/Enterprise), Cloudflare Turnstile, and AWS WAF.
- 🌐 Distributed Support: Works seamlessly with Crawlab's master/worker architecture.
- 🛠️ Multi-Framework: Examples for Selenium, Scrapy, and Puppeteer.
- 🤖 AI-Powered: High success rates for modern anti-bot challenges.
- 📈 Scalable: Handle thousands of CAPTCHAs per minute.
- Crawlab Instance: Installation Guide
- CapSolver API Key: Get it here
- Environment: Python 3.8+ or Node.js 16+
# Install Python dependencies
pip install selenium requests scrapy
# Install Node.js dependencies
npm install puppeteer-
Clone this repo:
git clone https://github.com/your-username/crawlab-capsolver-integration.git cd crawlab-capsolver-integration -
Set your API Key:
export CAPSOLVER_API_KEY="your-api-key-here"
-
Run an example:
python examples/selenium_recaptcha_v2.py
Detailed code examples are located in the examples/ directory.
Automate browser interactions and solve reCAPTCHA v2 challenges. 👉 View Python Script
Bypass Cloudflare's modern Turnstile challenges with ease. 👉 View Python Script
Integrate CAPTCHA solving directly into your Scrapy pipelines. 👉 View Scrapy Spider
Full support for JavaScript-based crawling environments. 👉 View Node.js Script
| Category | Recommendation |
|---|---|
| Error Handling | Implement exponential backoff for API retries. |
| Cost Control | Only trigger the solver when a CAPTCHA is detected. |
| Performance | Cache reCAPTCHA tokens (valid for ~2 mins). |
| Security | Use environment variables for API keys. |
| Error Code | Potential Cause | Solution |
|---|---|---|
ERROR_ZERO_BALANCE |
Insufficient credits. | Top up your CapSolver Dashboard. |
ERROR_CAPTCHA_UNSOLVABLE |
Incorrect site key or URL. | Verify the parameters extracted from the page. |
TimeoutError |
Network latency. | Increase the polling timeout in your script. |
To celebrate this integration, use the code Crawlab during your next recharge to receive an extra 6% credit!
Distributed under the MIT License. See LICENSE for more information.
Made with ❤️ by the community