Skip to content

Blashttp request_batch_stream#3080

Merged
liquidsec merged 2 commits intoblasthttp-integration-cleanfrom
blasthttp-streaming
May 5, 2026
Merged

Blashttp request_batch_stream#3080
liquidsec merged 2 commits intoblasthttp-integration-cleanfrom
blasthttp-streaming

Conversation

@TheTechromancer
Copy link
Copy Markdown
Collaborator

@TheTechromancer TheTechromancer commented May 5, 2026

Summary

Migrate BBOT's batch HTTP path from request_batch to the new request_batch_stream async-iterator API (blasthttp#17). Results now arrive in completion order — a slow request no longer blocks faster peers behind it, and Python processing overlaps with in-flight HTTP I/O.

Changes

Core helper (bbot/core/helpers/web/web.py)

  • Replace WebHelper.request_batch (returns list) with WebHelper.request_batch_stream (async generator). Same entry shapes (url / (url, kwargs) / (url, kwargs, tracker)); trackers are now correlated by a per-URL deque since completion order ≠ input order.
  • Add iter_batch_results adapter — the native blasthttp 0.4.0 iterator yields lists of BatchResult (chunked 1000-or-200ms drains across the Python↔Rust boundary); the upstream Python wrapper from the PR will yield individual items. The adapter handles both shapes so callers can write a single async for.

Module call sites migrated to helpers.request_batch_stream

  • pgp, git, telerik, iis_shortnames (×2), templates/bucket, ntlm

Module call sites migrated to client.request_batch_stream via iter_batch_results

  • http — true streaming. URL/URL_UNVERIFIED results emit immediately on arrival. Only OPEN_TCP_PORT paired probes (both http+https for the same host:port) are buffered, and only the https half, until the matching http resolves the suppression decision. Single-scheme OPEN_TCP_PORT events (rare) and unmatched paired https at end-of-stream pass through normally. Per-result processing extracted into _process_result.
  • web_brute — fuzz dispatch streams: yara WAF filter, redirect filter, and hit collection run inline as results arrive. The canary_found and hits decision still happens after stream end (it has to), but the work is interleaved with HTTP I/O instead of running serially after a full drain. Canary baseline and mid-scan validation also stream.

Test infra

  • mock_blasthttphandle_batch → async-generator handle_batch_stream; passthrough path normalizes Rust-side lists to individual items so the mock's contract is "one BatchResult per yield".
  • conftest — patches request_batch_stream instead of request_batch.
  • test_web — uses async for; tracker assertion is now set-based since order isn't deterministic.

The underlying blasthttp.BlastHTTP.request_batch is still used by test_web_rate_limit.py to exercise the library's min(global, per_call) rate-limit semantics — that's testing blasthttp itself, not BBOT, so it's left alone.

Test plan

  • pytest bbot/test/test_step_1/test_web.py bbot/test/test_step_1/test_web_rate_limit.py — 13 passed
  • All migrated module tests: test_module_{pgp,git,telerik,iis_shortnames,ntlm,http,web_brute,web_brute_shortnames,bucket_*} — 38 passed
  • ruff check and ruff format --check — clean
  • Verified the 7 unrelated test_step_1 failures (test_cli, test_dns::test_wildcards, test_modules_basic::test_module_loading, etc.) are pre-existing on this branch by re-running them with my changes stashed

@TheTechromancer TheTechromancer requested a review from liquidsec May 5, 2026 19:27
@TheTechromancer TheTechromancer self-assigned this May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

📊 Performance Benchmark Report

Comparing blasthttp-integration-clean (baseline) vs blasthttp-streaming (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.25ms 4.29ms +1.1%
Bloom Filter Large Scale Dns Brute Force 17.45ms 18.01ms +3.2%
Large Closest Match Lookup 340.06ms 342.36ms +0.7%
Realistic Closest Match Workload 178.87ms 177.99ms -0.5%
Event Memory Medium Scan 1784 B/event 1784 B/event +0.0%
Event Memory Large Scan 1768 B/event 1768 B/event +0.0%
Event Validation Full Scan Startup Small Batch 372.21ms 372.32ms +0.0%
Event Validation Full Scan Startup Large Batch 520.46ms 522.25ms +0.3%
Make Event Autodetection Small 24.89ms 25.17ms +1.1%
Make Event Autodetection Large 255.27ms 257.03ms +0.7%
Make Event Explicit Types 10.59ms 10.71ms +1.1%
Excavate Single Thread Small 3.387s 3.450s +1.9%
Excavate Single Thread Large 8.713s 8.736s +0.3%
Excavate Parallel Tasks Small 3.633s 3.661s +0.7%
Excavate Parallel Tasks Large 6.049s 6.074s +0.4%
Is Ip Performance 3.15ms 3.21ms +1.6%
Make Ip Type Performance 11.29ms 11.25ms -0.3%
Mixed Ip Operations 4.44ms 4.50ms +1.2%
Memory Use Web Crawl 154.0 MB 156.5 MB +1.6%
Memory Use Subdomain Enum 19.4 MB 19.4 MB +0.0%
Scan Throughput 100 3.516s 3.617s +2.9%
Scan Throughput 1000 27.010s 27.464s +1.7%
Typical Queue Shuffle 60.88µs 61.33µs +0.7%
Priority Queue Shuffle 706.28µs 707.96µs +0.2%

🎯 Performance Summary

No significant performance changes detected (all changes <10%)


🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 87.23404% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (554feb4) to head (5396147).
⚠️ Report is 3 commits behind head on blasthttp-integration-clean.

Files with missing lines Patch % Lines
bbot/core/helpers/web/web.py 38% 10 Missing ⚠️
bbot/modules/http.py 93% 6 Missing ⚠️
bbot/test/mock_blasthttp.py 85% 2 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                     @@
##           blasthttp-integration-clean   #3080   +/-   ##
===========================================================
- Coverage                           91%     91%   -0%     
===========================================================
  Files                              440     440           
  Lines                            38098   38110   +12     
===========================================================
+ Hits                             34361   34370    +9     
- Misses                            3737    3740    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liquidsec liquidsec merged commit d45f5fa into blasthttp-integration-clean May 5, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants