Summary
fetch_robots uses reqwest::get() which creates a default HTTP client with no timeout and ignores all FetcherConfig settings (proxy, TLS, custom headers), potentially hanging the entire crawl setup indefinitely.
Location
- File:
src/spiders/robots.rs
- Line(s): 25
Severity
High
Details
let rules = match reqwest::get(&url).await {
reqwest::get creates a brand-new default client with no timeout. If the target's robots.txt endpoint hangs, the entire crawl setup blocks forever. This client also ignores any proxy, TLS certificate validation settings, and custom headers configured in FetcherConfig.
Suggested Fix
Use a purpose-built client with a short timeout:
let client = reqwest::Client::builder()
.timeout(Duration::from_secs(10))
.build()
.unwrap_or_default();
let rules = match client.get(&url).send().await {
Or reuse the spider's existing Fetcher instance.
Automated finding by repo-monitor
Summary
fetch_robotsusesreqwest::get()which creates a default HTTP client with no timeout and ignores allFetcherConfigsettings (proxy, TLS, custom headers), potentially hanging the entire crawl setup indefinitely.Location
src/spiders/robots.rsSeverity
High
Details
reqwest::getcreates a brand-new default client with no timeout. If the target'srobots.txtendpoint hangs, the entire crawl setup blocks forever. This client also ignores any proxy, TLS certificate validation settings, and custom headers configured inFetcherConfig.Suggested Fix
Use a purpose-built client with a short timeout:
Or reuse the spider's existing
Fetcherinstance.Automated finding by repo-monitor