Refactor the slicec Binary to Write Encoded Slice Definitions to 'stdout'#729
Conversation
| "nonoverlapping", | ||
| "nonterminal", | ||
| "peekable", | ||
| "repr", |
There was a problem hiding this comment.
This is the name of a common rust attribute for enums. It's like specifying an underlying type in Slice. You specify what it's in-memory representation is.
| @@ -6,9 +6,14 @@ use crate::encode_into::*; | |||
| use crate::encoder::Encoder; | |||
| use crate::{Error, InvalidDataErrorKind, Result}; | |||
There was a problem hiding this comment.
The encoder already supported the 'view' types: str and [],
but I updated it to also support the 'owned types: String and Vec.
This is not strictly necessary, but makes the API more convenient, and more consistent (which let use a macro). With this change you can write
encoder.encode(&myVec) instead of encoder.encode(myVec.as_slice()).
Same for String and as_str.
|
|
||
| // We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag. | ||
| // We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag. | ||
| // Note that we always support encoding views into these types (which don't require allocating memory). |
There was a problem hiding this comment.
This sentence makes sense to me, but might be confusing.
It should be read "... support encoding views-into-these-types ..."
Thoughts?
There was a problem hiding this comment.
It doesn't make sense to me. You encode into a byte buffer, not into a "type".
There was a problem hiding this comment.
To me it would make sense if it says encoding view of as i you can always encode a view of a sequence, isn't that the point?
There was a problem hiding this comment.
isn't that the point?
Yes, that's what it was trying to say. Just not well, and I couldn't think of a better somehow.
I updated the wording, hopefully is more clear now.
| #[cfg(feature = "alloc")] | ||
| impl EncodeInto<Slice2> for &String { | ||
| fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> { | ||
| self.as_str().encode_into(encoder) |
There was a problem hiding this comment.
The encoding of &String just delegates to the encoding for &str.
This is purely to save the caller from needing to do the conversion themselves.
| impl<'a, T> EncodeInto<Slice2> for &'a Vec<T> | ||
| where | ||
| &'a T: EncodeInto<Slice2>, | ||
| { |
There was a problem hiding this comment.
Rust's powerful 'blanket-impl' syntax at work here. This translates to:
"Implement the EncodeInto trait for any &Vec<T>, such that &T itself already implements EncodeInto."
And we have to tie the lifetimes of the elements to the vector they're contained within (No one else can mess with the vec while we're encoding it's elements).
slicec/src/slice_file_converter.rs
Outdated
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. |
There was a problem hiding this comment.
Not sure if there's a better solution here... Alot of type names are identical between slicec and the Compiler Definitions; unsurprisingly.
slicec/src/slice_file_converter.rs
Outdated
|
|
||
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. | ||
| #![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line. |
There was a problem hiding this comment.
rustfmt does not handle aliased imports well apparently. It tries to format like normal imports, which makes them unreadable.
Tbf, aliases are pretty uncommon in practice.
There was a problem hiding this comment.
Can't we just qualify the grammar types, like in:
fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {
?
There was a problem hiding this comment.
Yeah of course, but then it's easy to accidentally forget to qualify a type (happened multiple times), not to mention it bloats all the function signatures.
If that's what everyone prefers I'm fine with it. But that's what I tried first and didn't like it.
There was a problem hiding this comment.
I find this Compiler prefix very confusing. You're giving it to types NOT defined in the Slice Compiler definitions. Given the module name, a Grammar prefix would be more logical.
There was a problem hiding this comment.
Well, I saw it as I'm giving to the types defined in the Slice Compiler.
The real problem is that "Slice Compiler" is an ambiguous term now.
I'll rename to use Grammar instead.
There was a problem hiding this comment.
Using GrammarAttribute seems more logical to me too, Compiler is unrelated to the imported module.
There was a problem hiding this comment.
Renamed the ambiguous Compiler prefix to the clearer Grammar prefix.
| /// In Slice, doc-comments are not allowed on parameters. Instead, you would use a '@param' tag applied to an enclosing | ||
| /// operation. But this is an implementation detail of the language, not something code-generators should deal with. | ||
| fn get_doc_comment_for_parameter(parameter: &CompilerParameter) -> Option<DocComment> { | ||
| let operation_comment = parameter.parent().comment()?; |
There was a problem hiding this comment.
The ? is the short-circuiting operator for Option and Result types.
If you apply it an Option, and your function also returns an Option, it will immediately return for None cases (what we do here).
Same for Result and Err cases.
| // =========================== // | ||
| // Direct conversion functions // | ||
| // =========================== // | ||
|
|
There was a problem hiding this comment.
For types which are 'simple', we can directly convert between slicec and mapped types. The conventional way to do this in Rust is via the From<T> trait.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
For everything else, we need to keep some state. So we implement their conversion functions inside a struct which can hold that state.
There was a problem hiding this comment.
Pull request overview
This PR refactors the slicec binary to emit an encoded “generateCode” request to stdout, and introduces handwritten Rust types + conversion code to map the compiler AST into the Slice-compiler “on-the-wire” definitions for encoding.
Changes:
- Add handwritten Slice-compiler definition types with
EncodeInto<Slice2>implementations. - Add an AST → definition-types converter that linearizes top-level + anonymous types for encoding.
- Update
slicecCLI to encode parsed files and write the encoded bytes tostdout.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
slicec/src/main.rs |
Encodes a generateCode request and writes bytes to stdout (replaces prior debug/diagnostic output behavior). |
slicec/src/slice_file_converter.rs |
Converts slicec AST structures into the handwritten definition types, including handling anonymous types. |
slicec/src/definition_types.rs |
Defines handwritten “on-the-wire” types and their Slice2 encoding implementations. |
slice-codec/src/slice2/encoding.rs |
Adds EncodeInto<Slice2> impls for &String and &Vec<T> to support encoding owned containers. |
.vscode/cspell.json |
Adds repr to the spellchecker dictionary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
slicec/src/main.rs
Outdated
| // TODO: replace this by forking a code-gen plugin once they exist. | ||
| // For now, if there are any diagnostics, we emit those and NOT the encoded definitions. | ||
| // Code-generators can tell if it's okay to decode or not by the presence of the `"generateCode"` string. | ||
| if totals.0 + totals.1 > 0 { | ||
| // If there were diagnostics, print them to 'stdout' and don't encode the Slice definitions. | ||
| print!("Diagnostics: "); | ||
| println!("{totals:?}"); | ||
| for diagnostic in updated_diagnostics { | ||
| println!("{diagnostic:?}"); | ||
| } | ||
| } else { | ||
| // Encode the parsed Slice definitions. | ||
| let encoded_bytes = match encode_generate_code_request(&files) { | ||
| Ok(bytes) => bytes, | ||
| Err(error) => { | ||
| eprintln!("{error:?}"); | ||
| std::process::exit(11); | ||
| } | ||
| }; | ||
|
|
||
| // Obtain an exclusive handle to 'stdout', and write the encoded bytes to it. | ||
| let mut stdout = std::io::stdout().lock(); | ||
| match stdout.write_all(&encoded_bytes) { | ||
| Ok(_) => {} | ||
| Err(error) => { | ||
| eprintln!("{error:?}"); | ||
| std::process::exit(12); | ||
| } | ||
| } | ||
| } | ||
| println!("{ast:?}"); | ||
|
|
||
| std::process::exit(i32::from(totals.1 != 0)); | ||
| // Success. | ||
| std::process::exit(0); |
There was a problem hiding this comment.
main unconditionally exits with code 0, even when totals.1 indicates compilation errors. This breaks the CLI contract and makes it hard to detect failures in scripts/pipelines. Consider preserving the previous behavior (non-zero exit when errors are present) while still skipping binary output when diagnostics exist.
slicec/src/main.rs
Outdated
|
|
||
| std::process::exit(i32::from(totals.1 != 0)); | ||
| // Success. | ||
| std::process::exit(0); |
There was a problem hiding this comment.
The success path writes encoded bytes to stdout and then calls std::process::exit(0). process::exit bypasses destructors, so any buffered stdout data may not be flushed, potentially truncating the binary output. Prefer returning from main (no explicit exit), or explicitly flush() stdout before exiting.
| std::process::exit(0); |
|
It would be helpful to "vendor-in" the Compiler definitions you used for encoding, even if all the generated code is hand-written for now. |
Yes, and I plan to do so. But I wanted to do the vendoring in a separate PR, since it's not directly related, and this is already a large PR. But, the hand-mapped definitions I have here should exactly match |
pepone
left a comment
There was a problem hiding this comment.
Looks good, I want to do some testing from C# to ensure the decoding works!
| } | ||
| } | ||
|
|
||
| /// TODO |
There was a problem hiding this comment.
I know the PR didn't change it, but what is the this TODO about, the missing doc comment?
There was a problem hiding this comment.
Yes, just that we should add doc-comments. I could remove these TODOs, but might as well wait until I do the full cleanup & add doc-comments I figure.
| /// | ||
| /// It uses macro-function-syntax, and should be called like: | ||
| /// `implement_encode_into_for_struct!(struct_type_name, field1, field2, ...);` | ||
| macro_rules! implement_encode_into_for_struct { |
There was a problem hiding this comment.
Are we planning to use the same approach for the generated code? If so, the macro should live in the codec library.
There was a problem hiding this comment.
Honestly, probably not. Haven't 100% decided yet, but I think it'd be better for the generated code to just generate the encoding normally, and not use a macro. Thoughts?
This macro is just for my own convenience. You can even see it as a kind of "psuedo code-gen".
| #[derive(Clone, Debug)] | ||
| pub struct Field { | ||
| pub entity_info: EntityInfo, | ||
| pub tag: Option<i32>, // TODO: varint32 isn't a real type? |
There was a problem hiding this comment.
I meant to remove this comment and open a proposal to add a varint32 (etc.) type to the Rust mapping.
I think it's cleaner to have a separate type for this, instead of needing to call special encode_varint functions. It makes the API more consistent (you always call encoder.encode, no special functions), and will let users encode Rust structs directly without needing Slice files (otherwise, there's no way to specify whether an i32 should use the int32 or the varint32 encoding). That's a future proposal though.
slicec/src/slice_file_converter.rs
Outdated
|
|
||
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. | ||
| #![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line. |
There was a problem hiding this comment.
Can't we just qualify the grammar types, like in:
fn get_attributes_from(attributes: Vec<&slicec::grammar::Attribute>) -> Vec<Attribute> {
?
| /// This struct exposes a function ([`SliceFileContentsConverter::convert`]) that converts the contents of a Slice file | ||
| /// from their AST representation, to a representation that can be encoded with the Slice encoding. | ||
| #[derive(Debug)] | ||
| pub struct SliceFileContentsConverter { |
There was a problem hiding this comment.
What is the point of this struct, can we just have a plain function that does the conversion. I don't see the advantage here.
There was a problem hiding this comment.
Originally, I tried implementing everything using simple From<T> conversions, but this doesn't work for something like myfield: Sequence<bool>. When converting this field, you have to yield 2 Symbols. One for the field, and one for the Sequence<bool> anonymous type. A From conversion can't capture this.
So, I added this struct with a dedicated field. This way, 1 conversion can yield multiple Symbols.
For example convert_field will return a Field, but under the hood, it's free to convert as many anonymous types as necessary and push them directly into the converted_contents field.
I'm open to other ideas. Not the biggest fan of this struct either, but I couldn't find a cleaner way.
It's basically the same as a Visitor, except it happens to not impl Visitor for reasons.
| pub mod definition_types; | ||
| pub mod slice_file_converter; | ||
|
|
||
| /// Attempts to encode a set of parsed Slice files into a byte-buffer. |
There was a problem hiding this comment.
I find this doc-comment odd. If the parsing succeeded, why would I get an error during encoding?
Then, is there any reason to format the doc-comments on ~50 columns, and not use the full 120 columns?
There was a problem hiding this comment.
Two things:
-
If the parser, generated code, and encoding are all perfectly implemented, then yes, this should always return
Ok. In the case where a bug exists though, it's better to have the actual error (for inspection and debugging) instead of just crashing the program. This is an internal function by the way. -
It's an effect of the
SliceEncoderAPI: 'encoding' something is a generally fallible act. You provide an invalid argument, something is out of range, there's an issue with the IO of your underlying buffer, who knows.
So,encodereturns aResult, and we safely handle them in this function. An alternative it tounwrap(which panics onErr), but unwrapping is generally discouraged.
|
|
||
| // We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag. | ||
| // We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag. | ||
| // Note that we always support encoding views into these types (which don't require allocating memory). |
There was a problem hiding this comment.
It doesn't make sense to me. You encode into a byte buffer, not into a "type".
| } | ||
|
|
||
| #[cfg(feature = "alloc")] | ||
| impl EncodeInto<Slice2> for &String { |
There was a problem hiding this comment.
We should definitely get rid of all these Slice2.
There was a problem hiding this comment.
Yes, but not for this PR. It's already large enough.
slicec/src/slice_file_converter.rs
Outdated
|
|
||
| // Pull in the core compiler types using aliases to disambiguate them from the Slice-compiler definitions. | ||
| // Any type that starts with 'Compiler' is a compiler type, not a mapped Slice-compiler definition type. | ||
| #![cfg_attr(rustfmt, rustfmt_skip)] // skip the `use ... as ...` to keep them one-per-line. |
There was a problem hiding this comment.
I find this Compiler prefix very confusing. You're giving it to types NOT defined in the Slice Compiler definitions. Given the module name, a Grammar prefix would be more logical.
slicec/src/slice_file_converter.rs
Outdated
| /// 1) Recursively convert the anonymous type (and any nested types) to the mapped definition types. | ||
| /// 2) Add these directly to [Self::converted_contents] (so these types appear in the contents before their users) | ||
| /// 3) Return its index in [Self::converted_contents] as a numeric TypeId. | ||
| fn get_type_id_for(&mut self, type_ref: &CompilerTypeRef) -> TypeId { |
There was a problem hiding this comment.
We should find a better name for this method. You don't expect a get_xxx method to modify self.
There was a problem hiding this comment.
How about compute_xxx?
Can't think of anything that I'm completely happy with, so open to suggestions.
pepone
left a comment
There was a problem hiding this comment.
Claude Review
Reviewed the 3 core files (definition_types.rs, slice_file_converter.rs, main.rs) and the encoding.rs additions. Overall the architecture is clean — the macro for struct encoding, the converter's anonymous-type inlining strategy, and the bit-sequence pattern for optionals are all well done.
Found 3 critical, 4 moderate, and 2 minor issues detailed in inline comments below.
| unsafe { | ||
| let discriminant = *<*const _>::from(self).cast::<u8>(); | ||
| encoder.encode_varint(discriminant)?; | ||
| } |
There was a problem hiding this comment.
Claude Review — Critical: Unsafe discriminant reading is unsound
For data-carrying #[repr(u8)] enums, Rust does not guarantee the discriminant is stored at byte offset 0. This unsafe pointer cast can be UB.
Safe alternative:
let discriminant: u8 = match self {
MessageComponent::Text(_) => 0,
MessageComponent::Link(_) => 1,
};
encoder.encode_varint(discriminant)?;Same issue applies to Symbol::encode_into below.
There was a problem hiding this comment.
I'm pretty sure Claude is just wrong here. Reading the first u8 of repr(u8) enum is explicitly guaranteed to be correct, and not UB, and never product 'incorrect encodings'.
This is the de facto way to read discriminants from discriminated unions in Rust.
See the reference on type-layouts for primitive backed enums:
https://doc.rust-lang.org/reference/type-layout.html#primitive-representation-of-enums-with-fields
And this post specifically about the safety of doing this:
https://users.rust-lang.org/t/enum-crimes-getting-the-discriminant/88103/4
| // SAFETY: this cast is guaranteed to be safe because the enum is marked with `repr(u8)`, which means we know | ||
| // the first 'field' of this type's data layout must be a u8. This lets us read without offsetting the pointer. | ||
| unsafe { | ||
| let discriminant = *<*const _>::from(self).cast::<u8>(); |
There was a problem hiding this comment.
Claude Review — Critical: Same unsound unsafe discriminant read as MessageComponent
Same fix — use a safe match:
let discriminant: u8 = match self {
Symbol::Interface(_) => 0,
Symbol::Enum(_) => 1,
Symbol::Struct(_) => 2,
Symbol::CustomType(_) => 3,
Symbol::SequenceType(_) => 4,
Symbol::DictionaryType(_) => 5,
Symbol::ResultType(_) => 6,
Symbol::TypeAlias(_) => 7,
};
encoder.encode_varint(discriminant)?;There was a problem hiding this comment.
This is probably what we'll generate to be fair, but again, Claude is just wrong here.
See my above comment and links.
| // Code-generators can tell if it's okay to decode or not by the presence of the `"generateCode"` string. | ||
| if totals.0 + totals.1 > 0 { | ||
| // If there were diagnostics, print them to 'stdout' and don't encode the Slice definitions. | ||
| print!("Diagnostics: "); |
There was a problem hiding this comment.
Claude Review — Critical: Diagnostics written to stdout will corrupt piped binary output
Since the binary-encoded definitions go to stdout (and the intended use is piping to another process or file), diagnostics must go to stderr. These print!/println! calls will interleave human-readable text with binary data for any downstream consumer.
eprint!("Diagnostics: ");
eprintln!("{totals:?}");
for diagnostic in updated_diagnostics {
eprintln!("{diagnostic:?}");
}There was a problem hiding this comment.
Again, not sure what Claude is seeing. But there's an if-else here. If there are errors, we write only the errors, if not we write the encoded binary. We never ever write both interleaved like it says.
This is all temporary anyways, until I add the forking logic.
slicec/src/main.rs
Outdated
|
|
||
| std::process::exit(i32::from(totals.1 != 0)); | ||
| // Success. | ||
| std::process::exit(0); |
There was a problem hiding this comment.
Claude Review — Critical: process::exit(0) may truncate binary output and masks errors
Two issues here:
std::process::exitbypasses destructors and may not flush the lockedstdouthandle from line 81, potentially truncating the binary output.- This exits
0unconditionally — even when the diagnostic branch was taken (line 63), so scripts/CI cannot detect compilation failures.
Suggested fix: explicitly flush stdout after writing, use a non-zero exit code for the diagnostic path, and prefer returning from main over process::exit.
There was a problem hiding this comment.
Apparently there was a better way to be doing this.
Rust allows main to return a special ExitCode type, so now instead of process::exit, we return an exit code. Much cleaner.
| let entity_info = get_entity_info_for(enumerator); | ||
| let raw_value = enumerator.value(); | ||
| let value = Discriminant { | ||
| absolute_value: raw_value.unsigned_abs() as u64, |
There was a problem hiding this comment.
Claude Review — Moderate: is_positive() returns false for value 0
In Rust, 0i128.is_positive() is false. So enumerator value 0 will encode as { absolute_value: 0, is_positive: false }. If the decoder interprets is_positive: false as "negative", it could misinterpret this as -0.
Worth adding a comment clarifying the convention, or switching to a tri-state / treating 0 as positive.
There was a problem hiding this comment.
I think we should maybe switch to isNegative? Both here in slicec, but also in the Slice definitions.
I will open a small PR to discuss.
| } | ||
| return args; | ||
| } | ||
|
|
There was a problem hiding this comment.
Claude Review — Moderate: Fallback panic! is fragile
If a new attribute type is added to the compiler without updating this function, slicec will crash. Consider unreachable!("Unknown attribute kind: {}", attribute.kind.directive()) for a better panic message, or returning an empty Vec as a fallback.
There was a problem hiding this comment.
Claude is right, but this code is temporary, and very clearly labeled with
// TODO this is a temporary hack because we know all the possible attributes.
and a longer comment explaining why & how we'll fix it in a follow-up PR.
slicec/src/main.rs
Outdated
| // TODO: replace this by forking a code-gen plugin once they exist. | ||
| // For now, if there are any diagnostics, we emit those and NOT the encoded definitions. | ||
| // Code-generators can tell if it's okay to decode or not by the presence of the `"generateCode"` string. | ||
| if totals.0 + totals.1 > 0 { |
There was a problem hiding this comment.
Claude Review — Minor: totals.0 + totals.1 is opaque
Consider destructuring for readability:
let (warnings, errors) = totals;
if warnings + errors > 0 {There was a problem hiding this comment.
All temporary code, but sure why not.
Fixed.
| let operation_comment = parameter.parent().comment()?; | ||
|
|
||
| // We get the parameter's doc-comment in 3 steps: | ||
| // 1) Try to find a matching '@param' tag on the operation's doc-comment. |
There was a problem hiding this comment.
Claude Review — Minor: Typo
"it's Message field" should be "its Message field" (possessive, not contraction).
slicec/src/slice_file_converter.rs
Outdated
| } | ||
| } | ||
|
|
||
| fn convert_enumerator(&mut self, enumerator: &CompilerEnumerator) -> Enumerator { |
There was a problem hiding this comment.
Should we use the into/from (I forget the names) convention?
There was a problem hiding this comment.
You're right, those are the correct names.
I implemented From for the types where it was possible, but due to anonymous types, this isn't possible for many types.
For example, take myField: Sequence<bool>.
To properly encode this, it would take 2 Symbols: a Field and a Sequence.
But for myField: bool, it would only take 1 symbol.
From doesn't really work when the number of returned elements can only be known at runtime.
I actually tried using all From first! Until I realized this problem with anonymous types.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -166,6 +178,16 @@ where | |||
| } | |||
| } | |||
|
|
|||
| #[cfg(feature = "alloc")] | |||
| impl<'a, T> EncodeInto<Slice2> for &'a Vec<T> | |||
| where | |||
| &'a T: EncodeInto<Slice2>, | |||
| { | |||
| fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> { | |||
| self.as_slice().encode_into(encoder) | |||
| } | |||
| } | |||
There was a problem hiding this comment.
Fair enough, there should be tests. I will add in a future PR though.
This PR is really not about encoding changes.
This PR does what the title says! I haven't implemented the process forking (i.e. where you can pass code-gen plugins on the command-line and
slicecwill fork them). That's next up on the list.But you can still pipe the output of
cargo run -- Test.sliceto another process, or a file.It's obviously hard to test this without a decoder. I've been using
cargo run -- Test.slice > slice-output.dat,and opening the file in a hex editor to manually check the bytes. The bytes seem correct for normal Slice files.
Structure
This PR does 3 big things, in 2 files:
definition_types.rs: this file is my 'hand-mapping' the Slice Compiler definitions.It will eventually be replaced by generated code when we add code-gen in Rust.
This file 1) Defines the types that will be sent on-the-wire and 2) implements the encoding of these types.
The encoding is complete, except for
DocCommentswhich require tags (which require size/position tracking which doesn't exist yet).slice_file_converter.rsis the more interesting one. This file provides conversion functions for mapping between domains. It takes types fromslicecand converts them to the Slice-generated types. Most of this is straightforward, but there are some differences between them which required attention.