Open
Conversation
…ng upload, progress interval, and integration tests
c65cf6e to
b3bb567
Compare
…, XetFileInfo Python constructor, and split tests by feature
85873ea to
1d81a43
Compare
5f9b2ef to
ed125ba
Compare
5240995 to
1604623
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aa3e037. Configure here.
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Replaces the old
upload_files/download_files/hash_filesPython functions with a new object-oriented API that exposesXetSessionand its child objects directly as PyO3 classes. This gives Python callers full control over session lifecycle, connection pooling, and progress reporting.The previous module-level functions are kept under
hf_xet/src/legacy/and remain importable asfrom hf_xet import upload_filesetc., but now emitDeprecationWarning.New Python API
Files Changed
New files
hf_xet/src/py_xet_session.rs—XetSessionPyO3 classhf_xet/src/py_upload_commit.rs—XetUploadCommitBuilder,XetUploadCommit,Sha256Policy, report typeshf_xet/src/py_file_upload_handle.rs—XetFileUploadhf_xet/src/py_stream_upload_handle.rs—XetStreamUploadhf_xet/src/py_file_download_group.rs—XetFileDownloadGroupBuilder,XetFileDownloadGrouphf_xet/src/py_file_download_handle.rs—XetFileDownloadhf_xet/src/py_download_stream_group.rs—XetDownloadStreamGroupBuilder,XetDownloadStreamGrouphf_xet/src/py_download_stream_handle.rs—XetDownloadStream,XetUnorderedDownloadStreamhf_xet/src/headers.rs—build_headers_with_user_agenthelperhf_xet/src/legacy/mod.rs— re-exports all legacy symbolshf_xet/src/legacy/types.rs—PyXetDownloadInfo,PyXetUploadInfo,PyPointerFilehf_xet/src/legacy/functions.rs— deprecatedupload_bytes,upload_files,download_files,force_sigint_shutdown;hash_filesretained without deprecationhf_xet/src/legacy/progress_update.rs—PyItemProgressUpdate,PyTotalProgressUpdate,WrappedProgressUpdaterhf_xet/src/legacy/runtime.rs— async runtime + SIGINT handler (used by legacy functions)hf_xet/src/legacy/token_refresh.rs— Python callback token refresher (used by legacy functions)hf_xet/tests/conftest.py— shared fixtures and upload helpershf_xet/tests/test_upload_commit.py— upload tests (file, bytes, stream, Sha256Policy, progress, abort)hf_xet/tests/test_file_download.py— file download tests (handles, round-trips, progress, cancel)hf_xet/tests/test_stream_download.py— ordered and unordered streaming download tests with range variantshf_xet/tests/test_progress.py— progress callback argument types and field verificationhf_xet/tests/test_session.py—XetSessionlifecycle and builder creation testsModified files
hf_xet/src/lib.rs— module declarations;blocking_call_with_signal_checkutility; legacy module registered at top level for backward compatibilityhf_xet/src/logging.rs— callsxet_pkg::init_logging()instead ofxet_runtimedirectlyhf_xet/Cargo.toml— addedxet-runtime,xet-clientdeps (for legacy module); feature flags route throughxet-pkgxet_pkg/src/xet_session/file_download_group.rs— exposesXetDownloadGroupReportas a Python class (pyclass(get_all),__repr__)xet_data/src/processing/xet_file.rs—#[new]Python constructor forXetFileInfoxet_pkg/Cargo.toml— addedno-default-cache,tokio-console,elevated_information_levelfeaturesxet_pkg/src/lib.rs— addedinit_logging()wrapperxet_runtime/src/core/runtime.rs— fork-safeDrop: detect child process via stored PID, discard runtime instead of blocking shutdown.github/workflows/ci.yml— added Python integration test step (maturin + pytest) to Linux, Windows, macOS jobsTest Plan
cargo test --verbose --no-fail-fastinhf_xet/maturin develop && pytest hf_xet/tests/ -vhuggingface_hubupload/download flows against staging: "Xet" test in Use the new XetSession API huggingface_hub#4116; manually tested progress update and Ctrl-C handlingDesign Notes
Token refresh: the old API required Python to pass a token-refresh callable that Rust invoked across the GIL boundary. The new API uses
.with_token_refresh_url(url, headers)— Rust refreshes autonomously via HTTP, removing GIL re-entry on the hot path.WrappedTokenRefresheris kept only inlegacy/.Progress callbacks:
with_progress_callback(fn, interval_ms=100)spawns a background thread that delivers(GroupProgressReport, dict[UniqueID, ItemProgressReport])to the Python callable. The same signature covers both upload and download groups, so a singleXetProgressReporterclass handles both.GIL release and Ctrl-C: queue operations (
upload_file,upload_bytes,download_file) usepy.detach()and return quickly. Long-wait operations (commit(),finish()) run the blocking call on a background thread while the calling thread releases the GIL for 100 ms windows and pollspy.check_signals()— Ctrl-C raisesKeyboardInterruptwithin one interval without starving other Python threads.XetError::KeyboardInterruptmaps toPyKeyboardInterrupt. The recommended caller pattern isexcept KeyboardInterrupt: session.sigint_abort(); raise, which is idiomatic Python:sigint_abort()flags the runtime so the background thread exits cleanly at its next checkpoint, and the cleanup is visible in Python code rather than hidden inside a C extension.Context managers and concurrency:
XetUploadCommitandXetFileDownloadGroupimplement__enter__/__exit__;__exit__delegates tocommit()/finish()on success andabort()on exception. Multipleupload_file/download_filecalls within awithblock run concurrently; the block exit waits for all to complete.Streaming:
commit.upload_stream()returns aXetStreamUploadhandle for incremental writes (.write(bytes), then.finish()before thewith-block exits).download_streamanddownload_unordered_streamaccept optionalstart/endbyte offsets; either may be omitted independently.Fork-safe runtime drop:
XetRuntimerecords its creating PID; ifdropfires in a child process afterfork, the parent's Tokio threads don't exist soshutdown_timeout()would block. The runtime is discarded viamem::forgetinstead, letting the OS reclaim memory on exit.Backward compatibility: all pre-1.x functions (
upload_bytes,upload_files,hash_files,download_files,force_sigint_shutdown) and types (PyXetDownloadInfo,PyXetUploadInfo,PyPointerFile,PyItemProgressUpdate,PyTotalProgressUpdate) remain importable from the top-levelhf_xetmodule. Deprecated functions emitDeprecationWarningatstacklevel=2.hash_filesis not deprecated.Note
Medium Risk
Medium risk due to a large surface-area change in Python bindings (new PyO3 classes, signal/interrupt handling, progress threads) plus a behavioral change to
XetRuntimedrop semantics for forked processes.Overview
Introduces a new PyO3 object model for Python consumers built around
XetSession, exposing builders/handles for uploads (XetUploadCommit,XetFileUpload,XetStreamUpload), grouped file downloads (XetFileDownloadGroup,XetFileDownload), and ordered/unordered streaming downloads with optional byte ranges.Moves the previous top-level Python functions/types into a
legacymodule, keeps them importable for backward compatibility, and addsDeprecationWarningemission while centralizing HTTP header construction (including automaticUser-Agentmerging).Extends the Rust crates to better support Python (pyclass report/ID types,
XetFileInfoconstructor), updates logging initialization viaxet_pkg, adds fork-safeXetRuntimedrop behavior, bumpshf_xetto1.5.0, and adds cross-platform CI steps to build the wheel withmaturinand run newpytestintegration tests.Reviewed by Cursor Bugbot for commit 5d3aa3c. Bugbot is set up for automated code reviews on this repo. Configure here.