feat: add init and upload in kernel-builder cli#378
Conversation
534af0b to
368adad
Compare
|
Some high-level thoughts first:
Should we somehow inform the users about it and take confirmation before proceeding?
Should this be Find the kernel at: https://hf.co/drbh/some-kernel/tree/{version}? This way, users can directly use that link to inspect files, etc. |
sayakpaul
left a comment
There was a problem hiding this comment.
Very minor comments. No RUST expertise yet.
template/CARD.md
Outdated
| @@ -0,0 +1,38 @@ | |||
| --- | |||
There was a problem hiding this comment.
Do we want to maintain two copies? We have one in https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/cli/card_template.md
| let bare_name = path.file_name().and_then(OsStr::to_str).ok_or_else(|| { | ||
| eyre::eyre!("Cannot determine directory name from `{path_str}`") | ||
| })?; | ||
| let owner = hf::whoami_username()?; |
danieldk
left a comment
There was a problem hiding this comment.
Reviewed the upload subcommand now as well. Really excited to get batched uploads!
kernel-builder/src/upload.rs
Outdated
| let mut operations: Vec<CommitOperation> = Vec::new(); | ||
|
|
||
| collect_benchmark_ops(&kernel_dir, &existing_files, is_new_branch, &mut operations)?; | ||
| collect_readme_ops(&kernel_dir, &mut operations); |
There was a problem hiding this comment.
README should go to the main branch. I think probably the most elegant way is to make operations a HashMap<String, Vec<CommitOperation>> with the branch name as the key. That way we can have a nested loop to handle everything.
There was a problem hiding this comment.
For later: we may want to do the card generation and upload in one go as part of upload, since we cannot do the card generation in the Nix sandbox iff we need to fetch something from the main repo to do it (depends a bit on whether we still need to enumerate all the build variants or whether that information is handled by the Hub itself through the new kernel type, if the remaining information in the card is relatively static, we can handle it during build as well).
Not for this PR though.
There was a problem hiding this comment.
Some relevant internal discussions: https://huggingface.slack.com/archives/C090JN2P8NB/p1774236408982349
There was a problem hiding this comment.
@drbh I think the most sensible thing for now would be to keep the card contents limited to what we can fetch from the build (after the build is successful). We will render capabilities, frameworks etc. on the Hub side upon each commit, anyway. So, the card could contain an example snippet, available function interfaces, etc. This is all implemented in Python CLI.
Totally okay if we want to do the segregation in a separate PR.
There was a problem hiding this comment.
And then there is this other thing of always uploading the card to main revision.
There was a problem hiding this comment.
@drbh I think the most sensible thing for now would be to keep the card contents limited to what we can fetch from the build (after the build is successful).
oh we are actually writing the card in the build step now! for exactly the reasons you mention above. Basically the card just extracts the functions from all and shows snippets/any description the user adds
And then there is this other thing of always uploading the card to main revision.
and we also do now push the card to main
kernels/kernel-builder/src/upload.rs
Line 140 in 6e14158
There was a problem hiding this comment.
The Python CLI does have a template for the snippet and derives it without having to rely on any fragile re. Are we replicating that as well?
kernels/kernels/src/kernels/cli/kernel_card_utils.py
Lines 177 to 180 in 6e14158
There was a problem hiding this comment.
yep thats is done in this file
and we also now avoidre for extracting the functions from __all__ by parsing the python ast. so generally I think this approach should be solid
There was a problem hiding this comment.
Perfect. I think that should be it.
kernel-builder/src/upload.rs
Outdated
| } | ||
|
|
||
| // Collect benchmark file operations: add matching files, delete stale ones. | ||
| fn collect_benchmark_ops( |
There was a problem hiding this comment.
Maybe we should call them commit_ops? I was initially a bit confused, because we already use the term ops for ops in the kernel itself. Adding the commit_ prefix may make it a bit clearer that we are referring to different ops here.
| fn collect_benchmark_ops( | |
| fn collect_benchmark_commit_ops( |
57523a2 to
f88776a
Compare
a50eff0 to
f6b9889
Compare
danieldk
left a comment
There was a problem hiding this comment.
Thanks! Added some more smaller comments.
kernel-builder/src/pyproject/card.rs
Outdated
| let content = fs::read_to_string(&init_path).ok()?; | ||
|
|
||
| let re = Regex::new(r#"__all__\s*=\s*\[\s*([^\]]*)\s*\]"#).ok()?; | ||
| let list_content = re.captures(&content)?.get(1)?.as_str(); |
There was a problem hiding this comment.
Regex parsing is really fragile. Not for this PR, but we should use a Python AST parser. Maybe tree-sitter-python or whatever ruff/ty uses (if it's available from crates.io). Maybe it's worthwhile creating a ticket?
There was a problem hiding this comment.
good point, I opt'ed to use https://crates.io/crates/rustpython-parser which is what ruff forked, only because ruff's parser has git dependencies and our use case is relatively simple
3ed8a0f to
63dbeb1
Compare
sayakpaul
left a comment
There was a problem hiding this comment.
I left some comments regarding upload and the card components. Additionally:
- Can the users optionally specify
revisionandrepo_idinupload? I guess we always upload to arevisionduring our uploads (performed after the build) to respect our versioning system. - I see we have
init_e2e.rstests. But I think we should also have tests foruploadand the card generation bits, preferably mimicking how we do it in the current Python CLI? - Do we want to update the docs in a separate PR?
- I think we should require users to have
hf-xetinstalled to benefit from fast uploads and downloads if we're not doing in here already?
If it would be an extra requirement, but I think faster transfers are already enabled because the |
| .join("__init__.py"); | ||
|
|
||
| let content = fs::read_to_string(&init_path).ok()?; | ||
| let stmts = ast::Suite::parse(&content, "<module>").ok()?; |
danieldk
left a comment
There was a problem hiding this comment.
Everything good from my side! Really nice work 🚀 .
yep, the upload command accepts kernels/kernel-builder/src/upload.rs Lines 46 to 53 in 4b4b3fa
we do have some tests for cards and uploads, here and herekernels/kernel-builder/src/upload.rs Line 432 in 4b4b3fa
yea I think thats best, this PR is already touching many files and it might be best to update the docs when we deprecate the python cli (maybe those can be separate, not sure yet)
yep we are using xet-core directly within the kernels/kernel-builder/Cargo.toml Line 18 in 4b4b3fa |
This PR is a WIP to add the init command to the kernel-builder rust cli. This branch mainly ports the init command from the python cli, with some improvements to the interface and updates to the templateThis PR adds an the
initanduploadcommand to the rust kernel-builder cli similar to implementation in the current python cli. It also adds abuildcommand that is a facade for thenix runso devs can interact with a single tool for the full creation lifecycle of a kernel.Mainly it aims to have better defaults to make the commands easier to use
initcommand only requires the name of the kernels (folder) and the owner is pulled via awhoamirequest if the user is already logged into hfinitcommand creates a new dir if a name is specified, if called within a dir, the cli will use the dir name as the kernel nameuploadcommand does not require a repo id if the id can be read from the builduploadcommand does not require a path to the build folder and will default to looking for thebuilddir if no explicit path is specifiedexample usage
init from inside of a existing dir
init from outside of a dir
build
upload