-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Closed
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers
Description
crawl4ai version
0.7.4
Expected Behavior
While crawling http://aifs-content.eastus.azurecontainer.io/ crawl4AI should resolve relative links(in the header) to absolute links and then crawl those pages without raising errors
Current Behavior
While crawling http://aifs-content.eastus.azurecontainer.io/, using this configurations
browser_conf = BrowserConfig(headless=True)
run_conf = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=depth, include_external=False
),
check_robots_txt=True,
)Crawl4AI throws the error below for relative links
Some digging around showed the error is thrown after failed validation from the function can_process_url
The generated raw_markdown on the other hand resolves the relative urls to their absolute counterpart
I have not been able to do further research to find out how, but I was wondering if the same could be done for the crawl logic
Is this reproducible?
Yes
Inputs Causing the Bug
- URL: http://aifs-content.eastus.azurecontainer.io/
- Settings Used:
browser_conf = BrowserConfig(headless=True)
run_conf = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=depth, include_external=False
),
check_robots_txt=True,
)Steps to Reproduce
Code snippets
url = "http://aifs-content.eastus.azurecontainer.io/"
depth = 3
# Configure browser and crawler settings
browser_conf = BrowserConfig(headless=True)
run_conf = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=depth, include_external=False
),
check_robots_txt=True
)
# Perform crawling
async with AsyncWebCrawler(config=browser_conf) as crawler:
results = await crawler.arun(url=url, config=run_conf)OS
Linux
Python version
3.11.9
Browser
Github Codespace
Browser version
No response
Error logs & Screenshots (if applicable)

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers