Skip to content

feat: add init and upload in kernel-builder cli#378

Merged
drbh merged 22 commits intomainfrom
kernel-builder-init-command
Mar 26, 2026
Merged

feat: add init and upload in kernel-builder cli#378
drbh merged 22 commits intomainfrom
kernel-builder-init-command

Conversation

@drbh
Copy link
Copy Markdown
Collaborator

@drbh drbh commented Mar 18, 2026

This PR is a WIP to add the init command to the kernel-builder rust cli. This branch mainly ports the init command from the python cli, with some improvements to the interface and updates to the template

This PR adds an the init and upload command to the rust kernel-builder cli similar to implementation in the current python cli. It also adds a build command that is a facade for the nix run so devs can interact with a single tool for the full creation lifecycle of a kernel.

Mainly it aims to have better defaults to make the commands easier to use

  1. the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf
  2. the init command creates a new dir if a name is specified, if called within a dir, the cli will use the dir name as the kernel name
  3. the upload command does not require a repo id if the id can be read from the build
  4. the upload command does not require a path to the build folder and will default to looking for the build dir if no explicit path is specified

example usage

init from inside of a existing dir

mkdir some-kernel
cd some-kernel
kernel-builder init
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

init from outside of a dir

kernel-builder init some-kernel
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

build

kernel-builder build
# ...
# some-kernel-torch-ext> no Makefile or custom installCheckPhase, doing nothing
# some-kernel-torch-ext> Checking of ABI compatibility
# some-kernel-torch-ext> 🐍 Checking for compatibility with manylinux_2_28 and Python ABI version 3.9
# some-kernel-torch-ext> ✅ No compatibility issues found
# some-kernel-torch-ext> Checking loading kernel with get_kernel
# some-kernel-torch-ext> Check whether the kernel can be loaded with get-kernel: some_kernel
# some-kernel-torch-ext> Running phase: removeBytecodeHook
# some-kernel-torch-ext> Removing Python bytecode

upload

kernel-builder upload
# Found 7 build variant(s) in /home/drbh/Projects/kernels/kernel-builder/some-kernel/build
# Using branch `v1` (new)
# Uploading 36 operations...
# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

@drbh drbh changed the title feat: support init in builder cli feat: add init and upload in kernel-builder cli Mar 19, 2026
@drbh drbh force-pushed the kernel-builder-init-command branch from 534af0b to 368adad Compare March 19, 2026 15:25
@drbh drbh marked this pull request as ready for review March 19, 2026 16:42
@sayakpaul
Copy link
Copy Markdown
Member

sayakpaul commented Mar 20, 2026

Some high-level thoughts first:

the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf

Should we somehow inform the users about it and take confirmation before proceeding?

# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

Should this be Find the kernel at: https://hf.co/drbh/some-kernel/tree/{version}? This way, users can directly use that link to inspect files, etc.

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments. No RUST expertise yet.

template/CARD.md Outdated
@@ -0,0 +1,38 @@
---
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let bare_name = path.file_name().and_then(OsStr::to_str).ok_or_else(|| {
eyre::eyre!("Cannot determine directory name from `{path_str}`")
})?;
let owner = hf::whoami_username()?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Copy Markdown
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the upload subcommand now as well. Really excited to get batched uploads!

let mut operations: Vec<CommitOperation> = Vec::new();

collect_benchmark_ops(&kernel_dir, &existing_files, is_new_branch, &mut operations)?;
collect_readme_ops(&kernel_dir, &mut operations);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README should go to the main branch. I think probably the most elegant way is to make operations a HashMap<String, Vec<CommitOperation>> with the branch name as the key. That way we can have a nested loop to handle everything.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: we may want to do the card generation and upload in one go as part of upload, since we cannot do the card generation in the Nix sandbox iff we need to fetch something from the main repo to do it (depends a bit on whether we still need to enumerate all the build variants or whether that information is handled by the Hub itself through the new kernel type, if the remaining information in the card is relatively static, we can handle it during build as well).

Not for this PR though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drbh I think the most sensible thing for now would be to keep the card contents limited to what we can fetch from the build (after the build is successful). We will render capabilities, frameworks etc. on the Hub side upon each commit, anyway. So, the card could contain an example snippet, available function interfaces, etc. This is all implemented in Python CLI.

Totally okay if we want to do the segregation in a separate PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then there is this other thing of always uploading the card to main revision.

Copy link
Copy Markdown
Collaborator Author

@drbh drbh Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drbh I think the most sensible thing for now would be to keep the card contents limited to what we can fetch from the build (after the build is successful).

oh we are actually writing the card in the build step now! for exactly the reasons you mention above. Basically the card just extracts the functions from all and shows snippets/any description the user adds

And then there is this other thing of always uploading the card to main revision.

and we also do now push the card to main

// README goes to main branch, build artifacts go to version branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python CLI does have a template for the snippet and derives it without having to rely on any fragile re. Are we replicating that as well?

if func_names:
vars["usage_example"] = EXAMPLE_CODE.format(
repo_id=repo_id, func_name=func_names[0]
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thats is done in this file

pub fn write_card(build: &Build, kernel_dir: &Path) -> Result<()> {
and we also now avoid re for extracting the functions from __all__ by parsing the python ast. so generally I think this approach should be solid

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I think that should be it.

}

// Collect benchmark file operations: add matching files, delete stale ones.
fn collect_benchmark_ops(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call them commit_ops? I was initially a bit confused, because we already use the term ops for ops in the kernel itself. Adding the commit_ prefix may make it a bit clearer that we are referring to different ops here.

Suggested change
fn collect_benchmark_ops(
fn collect_benchmark_commit_ops(

@drbh drbh force-pushed the kernel-builder-init-command branch from 57523a2 to f88776a Compare March 23, 2026 21:19
@drbh drbh marked this pull request as draft March 23, 2026 22:53
@drbh drbh force-pushed the kernel-builder-init-command branch from a50eff0 to f6b9889 Compare March 24, 2026 19:25
@drbh drbh marked this pull request as ready for review March 24, 2026 21:05
Copy link
Copy Markdown
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some more smaller comments.

let content = fs::read_to_string(&init_path).ok()?;

let re = Regex::new(r#"__all__\s*=\s*\[\s*([^\]]*)\s*\]"#).ok()?;
let list_content = re.captures(&content)?.get(1)?.as_str();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex parsing is really fragile. Not for this PR, but we should use a Python AST parser. Maybe tree-sitter-python or whatever ruff/ty uses (if it's available from crates.io). Maybe it's worthwhile creating a ticket?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I opt'ed to use https://crates.io/crates/rustpython-parser which is what ruff forked, only because ruff's parser has git dependencies and our use case is relatively simple

@drbh drbh force-pushed the kernel-builder-init-command branch from 3ed8a0f to 63dbeb1 Compare March 25, 2026 18:52
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments regarding upload and the card components. Additionally:

  • Can the users optionally specify revision and repo_id in upload? I guess we always upload to a revision during our uploads (performed after the build) to respect our versioning system.
  • I see we have init_e2e.rs tests. But I think we should also have tests for upload and the card generation bits, preferably mimicking how we do it in the current Python CLI?
  • Do we want to update the docs in a separate PR?
  • I think we should require users to have hf-xet installed to benefit from fast uploads and downloads if we're not doing in here already?

@danieldk
Copy link
Copy Markdown
Member

  • I think we should require users to have hf-xet installed to benefit from fast uploads and downloads if we're not doing in here already?

If it would be an extra requirement, but I think faster transfers are already enabled because the xet feature of the huggingface-hub dependency is enabled.

.join("__init__.py");

let content = fs::read_to_string(&init_path).ok()?;
let stmts = ast::Suite::parse(&content, "<module>").ok()?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!!

Copy link
Copy Markdown
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything good from my side! Really nice work 🚀 .

@drbh
Copy link
Copy Markdown
Collaborator Author

drbh commented Mar 26, 2026

Can the users optionally specify revision and repo_id in upload? I guess we always upload to a revision during our uploads (performed after the build) to respect our versioning system.

yep, the upload command accepts --repo_id and --branch as arguments and simply defaults to using the repo id and version from the build toml. snippet here

/// Repository ID on the Hugging Face Hub (e.g. `user/my-kernel`).
/// Defaults to `general.hub.repo-id` from `build.toml`.
#[arg(long)]
pub repo_id: Option<String>,
/// Upload to a specific branch (defaults to `v{version}` from metadata).
#[arg(long)]
pub branch: Option<String>,

I see we have init_e2e.rs tests. But I think we should also have tests for upload and the card generation bits, preferably mimicking how we do it in the current Python CLI?

we do have some tests for cards and uploads, here

and here the main difference between those tests and the ones in the e2e file are that the e2e uses the built binary to confirm the commands work
let bin = std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("target/debug/kernel-builder");
I agree that it would be nice have more e2e tests but maybe those can be followed up in another PR? also for good upload e2e tests we'd need to determine the best way to mock the network requests..

Do we want to update the docs in a separate PR?

yea I think thats best, this PR is already touching many files and it might be best to update the docs when we deprecate the python cli (maybe those can be separate, not sure yet)

I think we should require users to have hf-xet installed to benefit from fast uploads and downloads if we're not doing in here already?

yep we are using xet-core directly within the huggingface-hub dependency, in our case we have the xet featured enabled and it uses xet under the hood

huggingface-hub = { git = "https://github.com/huggingface/huggingface_hub_rust.git", rev = "8cbc662035e04d4be8e829316272893e980f5926", package = "huggingface-hub", features = ["blocking", "xet"] }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants