Skip to content

DenimEvert/crawlab-capsolver

Repository files navigation

Crawlab + CapSolver Integration Guide

GitHub license GitHub stars GitHub issues Python Version Node.js Version

Automated CAPTCHA Solving for Distributed Crawling. Integrate CapSolver with Crawlab to build enterprise-grade crawling systems that bypass reCAPTCHA, Cloudflare Turnstile, and more.


📖 Table of Contents


🚀 Introduction

Managing web crawlers at scale requires robust infrastructure. Crawlab is a powerful distributed web crawler management platform, while CapSolver provides AI-powered CAPTCHA solving services. This repository provides ready-to-use templates to integrate these two powerhouses.

What is Crawlab?

Crawlab is a language-agnostic distributed crawler management platform. It supports Python, Node.js, Go, and more, allowing you to manage spiders across multiple nodes with a beautiful UI.

What is CapSolver?

CapSolver is an AI-driven service that solves various CAPTCHAs including reCAPTCHA (v2/v3/Enterprise), Cloudflare Turnstile, and AWS WAF.


✨ Key Features

  • 🌐 Distributed Support: Works seamlessly with Crawlab's master/worker architecture.
  • 🛠️ Multi-Framework: Examples for Selenium, Scrapy, and Puppeteer.
  • 🤖 AI-Powered: High success rates for modern anti-bot challenges.
  • 📈 Scalable: Handle thousands of CAPTCHAs per minute.

📋 Prerequisites

# Install Python dependencies
pip install selenium requests scrapy

# Install Node.js dependencies
npm install puppeteer

⚡ Quick Start

  1. Clone this repo:

    git clone https://github.com/your-username/crawlab-capsolver-integration.git
    cd crawlab-capsolver-integration
  2. Set your API Key:

    export CAPSOLVER_API_KEY="your-api-key-here"
  3. Run an example:

    python examples/selenium_recaptcha_v2.py

🛠 Integration Examples

Detailed code examples are located in the examples/ directory.

Selenium + reCAPTCHA v2

Automate browser interactions and solve reCAPTCHA v2 challenges. 👉 View Python Script

Cloudflare Turnstile

Bypass Cloudflare's modern Turnstile challenges with ease. 👉 View Python Script

Scrapy Middleware

Integrate CAPTCHA solving directly into your Scrapy pipelines. 👉 View Scrapy Spider

Node.js + Puppeteer

Full support for JavaScript-based crawling environments. 👉 View Node.js Script


💡 Best Practices

Category Recommendation
Error Handling Implement exponential backoff for API retries.
Cost Control Only trigger the solver when a CAPTCHA is detected.
Performance Cache reCAPTCHA tokens (valid for ~2 mins).
Security Use environment variables for API keys.

🔍 Troubleshooting

Error Code Potential Cause Solution
ERROR_ZERO_BALANCE Insufficient credits. Top up your CapSolver Dashboard.
ERROR_CAPTCHA_UNSOLVABLE Incorrect site key or URL. Verify the parameters extracted from the page.
TimeoutError Network latency. Increase the polling timeout in your script.

🎁 Bonus Code

To celebrate this integration, use the code Crawlab during your next recharge to receive an extra 6% credit!

Register on CapSolver Now


📄 License

Distributed under the MIT License. See LICENSE for more information.


Made with ❤️ by the community

About

Crawlab & CapSolver integration. Solve reCAPTCHA & Turnstile for distributed web crawling systems.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors