Skip to content

[repo-monitor] High: allowed_domains whitelist bypassed when URL has no parseable host #1

@Liohtml

Description

@Liohtml

Summary

When request.domain() returns None (malformed URL, data: scheme, file://), the allowed_domains whitelist check is silently skipped and the request is allowed through.

Location

  • File: src/spiders/engine.rs
  • Line(s): 270–278

Severity

High

Details

if !allowed.is_empty() {
    if let Some(domain) = request.domain() {
        if !allowed.contains(&domain) { /* rejected */ }
    }
    // If domain() is None, request is silently ALLOWED!
}

A spider configured with allowed_domains could fetch local files (file://) or internal network resources if follow-URLs include non-HTTP schemes.

Suggested Fix

Reject requests when the domain cannot be parsed:

if !allowed.is_empty() {
    match request.domain() {
        Some(domain) if allowed.contains(&domain) => {} // allowed
        _ => {
            stats.lock().await.offsite_requests_count += 1;
            return; // reject unknown or unparseable domains
        }
    }
}

Automated finding by repo-monitor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions