Skip to content

fix: robust 429 retry in get_data.sh prevents 0-byte test data files#260

Merged
mwang87 merged 1 commit intomasterfrom
copilot/investigate-xml-absence
Apr 8, 2026
Merged

fix: robust 429 retry in get_data.sh prevents 0-byte test data files#260
mwang87 merged 1 commit intomasterfrom
copilot/investigate-xml-absence

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

massiveproxy.gnps2.org rate-limits to ~10 req/min. The previous wget-based download loop (3 retries, 5s waitretry) exhausted all attempts within the same rate-limit window and — critically — wget --output-document always creates the output file, even on total failure. Tests then received 0-byte files and died with lxml.etree.XMLSyntaxError: no element found rather than a meaningful download error.

Changes

  • Switch wgetcurl with HTTP status capture (-w "%{http_code}"), enabling reliable 429 detection at the shell level
  • 429-aware backoff: waits 60s on first rate-limit hit, adding 30s per subsequent retry (up to 8 attempts)
  • No stale empty files: failed attempts rm -f the output before sleeping, eliminating the 0-byte artifact that silently poisoned tests
  • Non-empty validation: success requires both HTTP 200 and -s (non-empty file) check
  • 7s inter-download sleep maintained to stay within the ~10 req/min budget on happy-path runs
download() {
    ...
    http_code=$(curl -L -o "$filename" -w "%{http_code}" --silent --show-error "$url")
    if [ "$curl_exit" -eq 0 ] && [ "$http_code" = "200" ] && [ -s "$filename" ]; then
        sleep 7; return 0
    fi
    rm -f "$filename"
    if [ "$http_code" = "429" ]; then
        wait_time=$((60 + retry * 30))
        sleep "$wait_time"
    fi
    ...
}

@mwang87 mwang87 marked this pull request as ready for review April 8, 2026 16:01
@mwang87 mwang87 merged commit 17e8c74 into master Apr 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants