Skip to content

stadiamaps/ripcurl

Repository files navigation

ripcurl

crates.io docs.rs

File transfers are hard. Ripcurl's goal is to make them easy, without needing to remember an alphabet soup of command line flags, or resort to writing contorted loops in a shell script.

CLI Quickstart

ripcurl https://example.com/really-big-file.tar.gz /path/to/really-big-file.tar.gz

That's it!

Using ripcurl as a library

Ripcurl can also be used as a Rust library to make large streaming transfers more resilient.

Download a file

The simplest path is [transfer::execute_transfer], which handles source/destination resolution, retries, and progress tracking. This is essentially what the ripcurl CLI does.

use ripcurl::transfer::{TransferConfig, execute_transfer};
use url::Url;

#[tokio::main]
async fn main() -> Result<(), ripcurl::protocol::TransferError> {
    let source = Url::parse("https://example.com/big-file.tar.gz").unwrap();
    let dest = Url::parse("file:///tmp/big-file.tar.gz").unwrap();

    let config = TransferConfig {
        max_retries: 10,
        overwrite: false,
        custom_http_headers: vec![],
    };

    let bytes_written = execute_transfer(source, dest, &config, None).await?;
    println!("Transferred {bytes_written} bytes");
    Ok(())
}

Stream bytes without writing to disk

If you want to consume bytes directly (e.g. for parsing or piping), use the [stream] module:

use futures_util::StreamExt;
use ripcurl::stream::stream_from_url;
use ripcurl::transfer::TransferConfig;
use std::pin::pin;
use url::Url;

#[tokio::main]
async fn main() -> Result<(), ripcurl::protocol::TransferError> {
    let url = Url::parse("https://example.com/data.json").unwrap();
    let config = TransferConfig {
        max_retries: 10,
        overwrite: false,
        custom_http_headers: vec![],
    };

    // NOTE: The streaming API will immediately fail if the source doesn't support random access
    // (e.g. an HTTP server that doesn't support range requests).
    let (stream, info) = stream_from_url(url, &config).await?;
    let mut stream = pin!(stream);
    if let Some(size) = info.total_size {
        println!("Expecting {size} bytes");
    }

    let mut total = 0u64;
    while let Some(chunk) = stream.next().await {
        let bytes = chunk?;
        total += bytes.len() as u64;
        // Process bytes...
    }
    println!("Received {total} bytes");
    Ok(())
}

Without needing to ask, ripcurl will:

  • Log important events in the transfer lifecycle
  • Automatically retry on transient errors (network failures, some server error codes, stalled connections)
  • Show status updates, or not, depending on your environment (periodic logs in CI; animated progress in a TTY)
  • Cowardly refuse to overwrite an existing file unless you signal that's what you really wanted with --overwrite

Why would you want this?

Even today, file transfers over the internet can be flaky. You probably don't notice this often for smaller transfers (up to a few gigabytes), but if you're regularly slinging 100GB+ files around, you've probably experienced a transfer failure.

Some domain-specific tools (like the aws s3 CLI) are more reliable than others. But if you regularly deal with HTTP, you need to remember quite a few command line switches, and you typically need to orchestrate retries on your own.

At Stadia Maps, we deal with a lot of file transfers in our data pipelines, so we built this tool out of our own pain. It's infuriating and wastes everyone's time when a plain old curl or wget fails mid-transfer in a CI pipeline.

We like tools that "just work", so with ripcurl our aim is to create a pit of success rather than a labryinth of switches and manual orchestration.

Goals

  • Sane defaults for transfers. No need to specify that you want to follow redirects or other things that you probably want, but usually forget to ask. Ripcurl should essentially be zero-config for most users.
  • Resilient. Ripcurl will automatically retry failed transfers when possible.
  • Well-tested. Ripcurl has an extensive test suite covering dozens of cases we've seen in the wild.

Non-goals

This is a focused, opinionated project. The following are non-goals:

  • Support for dozens of protocols. This project does not have the same scope as, say, curl. (That said, open an issue if you'd like support for a new protocol so we can discuss.)
  • Granular configurability. We're trying to make the most straightforward tool for the vast majority of file transfer needs.
  • Non-file-like use cases. Ripcurl isn't a swiss army knife; it's a focused tool for file transfers. If you want a CLI tool for interacting with everything from a REST API to streams, then other tools are a better fit.

FAQs

What's with the name?

It's an homage to curl in the style of ripgrep or ripunzip. Don't read too much into it; it's not serious, and I mostly thought it sounded funny and "fast." It's also the name of an Australian surfwear company, apparently.

Why not <some other tool>

  • Most other tools don't automatically follows redirects.
  • Some other tools don't understand which headers are safe to forward cross-origin. This can be a problem if you do things like authenticate to an API which redirects you to a pre-signed S3 URL for the actual download. It's also a security problem.
  • Most other tools won't persistently retry to finish the transfer.

Why doesn't ripcurl do X?

See the non-goals section, probably. Other tools are better at these things.

About

Very persistent file transfers

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors