-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Open
Labels
🐞 BugSomething isn't workingSomething isn't working📌 Root causedidentified the root cause of bugidentified the root cause of bug
Description
crawl4ai version
0.7.4
Expected Behavior
It was supposed to deep crawl all the urls provided while doing arun_many.
Current Behavior
Instead of crawling with deep crawl, it is giving failed to crawl.
Is this reproducible?
Yes
Inputs Causing the Bug
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
config = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=20,
max_pages=5000,
include_external=False, # stay inside site
),
cache_mode=CacheMode.ENABLED,
exclude_external_links=True,
exclude_social_media_links=True,
stream=True, # important for streaming results
page_timeout=240000,
scraping_strategy=LXMLWebScrapingStrategy(),
wait_until="domcontentloaded",
semaphore_count=3,
markdown_generator=DefaultMarkdownGenerator(content_source="raw_html")
)
async with AsyncWebCrawler() as crawler:
# Step 1: await arun_many to get async iterator
result_iterator = await crawler.arun_many(
urls=["https://leclairfoundation.org", "https://jottful.com"],
config=config
)
# Step 2: iterate over results as they complete
async for result in result_iterator:
if result.success:
print(f"Just completed: {result.url}")
# process_result(result)
else:
print(f"Failed: {result.url} – {result.error}")
asyncio.run(main())Steps to Reproduce
Code snippets
OS
macOS
Python version
3.12
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
python3 scrapping_crawl4ai_current_solution.py [INIT].... → Crawl4AI 0.7.4 Failed: https://jottful.com Failed: https://leclairfoundation.org
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 BugSomething isn't workingSomething isn't working📌 Root causedidentified the root cause of bugidentified the root cause of bug