Skip to content

Draft: xet CLI utility for file upload, download, and inspection#759

Draft
hoytak wants to merge 8 commits intomainfrom
hoytak/260317-xet-cli
Draft

Draft: xet CLI utility for file upload, download, and inspection#759
hoytak wants to merge 8 commits intomainfrom
hoytak/260317-xet-cli

Conversation

@hoytak
Copy link
Copy Markdown
Collaborator

@hoytak hoytak commented Mar 27, 2026

This PR adds a new xet command-line binary to xet_pkg for directly uploading, downloading, and inspecting files against a CAS endpoint — useful for development, debugging, and scripting without going through git-xet or huggingface_hub.

The binary exposes four subcommands under xet file:

upload — upload one or more files (or stdin) and emit file metadata (hash, size, sha256).
download — download by xet hash to a file or stdout, with optional source and write byte ranges.
scan — dry-run dedup/compression analysis without uploading data.
dump-reconstruction — fetch and display reconstruction metadata as JSON.

Endpoint resolution currently follows the same conventions as session API, but using arguments passed in. --endpoint overrides HF_ENDPOINT, which defaults to https://huggingface.co; endpoint can also be a local directory in which case a LocalClient is used. Token resolution uses --token then HF_TOKEN.

All config values can be overridden with -c KEY=VALUE.

# Upload a single file and see its hash/size/sha256
xet file upload mydata.bin

# Upload multiple files at once
xet file upload file1.bin file2.bin file3.bin

# Upload from stdin
cat model.safetensors | xet file upload -

# Upload and write results to JSON
xet file upload --output results.json *.bin

# Download a file by hash
xet file download abc123...def -o restored.bin 

# Download a byte range to stdout
xet file download abc123...def --source-range 0..4096

# Download a range and write it into a specific offset of an existing file
xet file download abc123...def -o target.bin --source-range 1024..2048 --write-range 0..1024

# Dry-run scan to check dedup/compression ratio without uploading
xet file scan --recursive ./dataset/

# Inspect reconstruction metadata for a file hash
xet file dump-reconstruction abc123...def

# Use a local CAS directory instead of a remote endpoint
xet --endpoint /tmp/local-cas file upload data.bin

hoytak added 5 commits March 31, 2026 17:49
Resolve Cargo.lock, hf_xet/Cargo.lock, and xet_pkg/Cargo.toml (keep clap/serde_json/walkdir + ulid).

Adapt xet CLI to session API: per-operation auth on UploadCommitBuilder and
DownloadStreamGroupBuilder; new_upload_commit().build().await; download via
new_download_stream_group().

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant