Skip to content
Merged
1 change: 1 addition & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"nonoverlapping",
"nonterminal",
"peekable",
"repr",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of a common rust attribute for enums. It's like specifying an underlying type in Slice. You specify what it's in-memory representation is.

"rfind",
"rsplit",
"RUSTDOCFLAGS",
Expand Down
24 changes: 23 additions & 1 deletion slice-codec/src/slice2/encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,14 @@ use crate::encode_into::*;
use crate::encoder::Encoder;
use crate::{Error, InvalidDataErrorKind, Result};
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoder already supported the 'view' types: str and [],
but I updated it to also support the 'owned types: String and Vec.

This is not strictly necessary, but makes the API more convenient, and more consistent (which let use a macro). With this change you can write
encoder.encode(&myVec) instead of encoder.encode(myVec.as_slice()).

Same for String and as_str.


// We only support `BTreeMap` if the `alloc` crate is available through the `alloc` feature flag.
// We only support 'owned' sequence/dictionary types if the `alloc` crate is available through the `alloc` feature flag.
// Note that we always support encoding views into these types (which don't require allocating memory).
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence makes sense to me, but might be confusing.
It should be read "... support encoding views-into-these-types ..."
Thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to me. You encode into a byte buffer, not into a "type".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it would make sense if it says encoding view of as i you can always encode a view of a sequence, isn't that the point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the point?

Yes, that's what it was trying to say. Just not well, and I couldn't think of a better somehow.
I updated the wording, hopefully is more clear now.

#[cfg(feature = "alloc")]
use alloc::collections::BTreeMap;
#[cfg(feature = "alloc")]
use alloc::string::String;
#[cfg(feature = "alloc")]
use alloc::vec::Vec;

// We only support `HashMap` if the standard library is available through the `std` feature flag.
#[cfg(feature = "std")]
Expand Down Expand Up @@ -151,6 +156,13 @@ impl EncodeInto<Slice2> for &str {
}
}

#[cfg(feature = "alloc")]
impl EncodeInto<Slice2> for &String {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely get rid of all these Slice2.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not for this PR. It's already large enough.

fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> {
self.as_str().encode_into(encoder)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding of &String just delegates to the encoding for &str.

This is purely to save the caller from needing to do the conversion themselves.

}
}

/// TODO
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the PR didn't change it, but what is the this TODO about, the missing doc comment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just that we should add doc-comments. I could remove these TODOs, but might as well wait until I do the full cleanup & add doc-comments I figure.

impl<'a, T> EncodeInto<Slice2> for &'a [T]
where
Expand All @@ -166,6 +178,16 @@ where
}
}

#[cfg(feature = "alloc")]
impl<'a, T> EncodeInto<Slice2> for &'a Vec<T>
where
&'a T: EncodeInto<Slice2>,
{
Comment on lines +182 to +185
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's powerful 'blanket-impl' syntax at work here. This translates to:
"Implement the EncodeInto trait for any &Vec<T>, such that &T itself already implements EncodeInto."

And we have to tie the lifetimes of the elements to the vector they're contained within (No one else can mess with the vec while we're encoding it's elements).

fn encode_into(self, encoder: &mut Encoder<impl OutputTarget, Slice2>) -> Result<()> {
self.as_slice().encode_into(encoder)
}
}
Comment on lines 159 to +189
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, there should be tests. I will add in a future PR though.
This PR is really not about encoding changes.


// =============================================================================
// Dictionary type implementations
// =============================================================================
Expand Down
332 changes: 332 additions & 0 deletions slicec/src/definition_types.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
// Copyright (c) ZeroC, Inc.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, the most important thing to scrutinize is that the encoding logic is correct.


//! This module contains the handwritten encoding code for the Slice-compiler definitions.
//! After rust code-gen has been implemented, this file will be deleted, and we will use generated definitions instead.

use slice_codec::slice2::Slice2;

use slice_codec::buffer::OutputTarget;
use slice_codec::encode_into::*;
use slice_codec::encoder::Encoder;
use slice_codec::Result;

/// TAG_END_MARKER must be encoded at the end of every non-compact type.
const TAG_END_MARKER: i32 = -1;

/// This macro implements `EncodeInto<Slice2>` for a Rust struct (which is mapped from a non-compact Slice struct).
/// It encodes all the struct's fields (in definition order), followed by `TAG_END_MARKER`.
///
/// It uses macro-function-syntax, and should be called like:
/// `implement_encode_into_for_struct!(struct_type_name, field1, field2, ...);`
macro_rules! implement_encode_into_for_struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to use the same approach for the generated code? If so, the macro should live in the codec library.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, probably not. Haven't 100% decided yet, but I think it'd be better for the generated code to just generate the encoding normally, and not use a macro. Thoughts?

This macro is just for my own convenience. You can even see it as a kind of "psuedo code-gen".

($type_name:ty$(, $field_name:ident)*$(,)?) => {
Comment on lines +21 to +22
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 lines give the macro a name, and describes the syntax to invoke it with. It should be passed a ty (Type) token, followed by 0 or more ident tokens, each separated by a comma. It supports an optional trailing comma $(,)?.

impl EncodeInto<Slice2> for &$type_name {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
$(encoder.encode(&self.$field_name)?;)*
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $(...)* on this line means that this line will be copied for every single field_name passed into this macro. And the $field_name part of this line will (of course) be substituted out for that provided field_name.

For structs without varints or optionals. This macro just works.

encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}
}
}

// ================= //
// Hand-mapped types //
// ================= //

pub type EntityId = String;
pub type TypeId = String;
pub type Message = Vec<MessageComponent>;
Comment on lines +37 to +39
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping for the typealiases we use in the Slice files.

Followed by each of the mapped types. These should exactly correspond to their Slice-file counterparts, so I only added comments where something was worth mentioning.


#[derive(Clone, Debug)]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good practice in Rust to pro-actively implement these two traits where-ever possible.

Note that Clone does not mean Rust will make copies of the thing automatically, that's the Copy trait instead.

pub struct Attribute {
pub directive: String,
pub args: Vec<String>,
}
implement_encode_into_for_struct!(Attribute, directive, args);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple structs, we just use the macro to implement the encoding function.


#[derive(Clone, Debug)]
pub struct TypeRef {
pub type_id: TypeId,
pub is_optional: bool,
pub type_attributes: Vec<Attribute>,
}
implement_encode_into_for_struct!(TypeRef, type_id, is_optional, type_attributes);

#[derive(Clone, Debug)]
pub struct EntityInfo {
pub identifier: String,
pub attributes: Vec<Attribute>,
pub comment: Option<DocComment>,
}
impl EncodeInto<Slice2> for &EntityInfo {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
encoder.encode(&self.identifier)?;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a conscious choice to omit types, since this is common in Rust.
If we think this is too unclear, our encode function supports being called with an explicit type:
encoder.encode::<&String>(&self.identifier)
More of a thought for whether the codegen should include types or not.

encoder.encode(&self.attributes)?;
// encoder.encode_tagged(0, &self.comment)?; TODO add doc-comments after adding tag encoding support.
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Module {
pub identifier: String,
pub attributes: Vec<Attribute>,
}
implement_encode_into_for_struct!(Module, identifier, attributes);

#[derive(Clone, Debug)]
pub struct Struct {
pub entity_info: EntityInfo,
pub is_compact: bool,
pub fields: Vec<Field>,
}
implement_encode_into_for_struct!(Struct, entity_info, is_compact, fields);

#[derive(Clone, Debug)]
pub struct Field {
pub entity_info: EntityInfo,
pub tag: Option<i32>, // TODO: varint32 isn't a real type?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is not real?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove this comment and open a proposal to add a varint32 (etc.) type to the Rust mapping.

I think it's cleaner to have a separate type for this, instead of needing to call special encode_varint functions. It makes the API more consistent (you always call encoder.encode, no special functions), and will let users encode Rust structs directly without needing Slice files (otherwise, there's no way to specify whether an i32 should use the int32 or the varint32 encoding). That's a future proposal though.

pub data_type: TypeRef,
}
impl EncodeInto<Slice2> for &Field {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.tag.is_some())?;

// Encode the actual fields.
encoder.encode(&self.entity_info)?;
if let Some(tag_value) = self.tag {
encoder.encode_varint(tag_value)?;
}
encoder.encode(&self.data_type)?;
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
Comment thread
InsertCreativityHere marked this conversation as resolved.
}
}

#[derive(Clone, Debug)]
pub struct Interface {
pub entity_info: EntityInfo,
pub bases: Vec<EntityId>,
pub operations: Vec<Operation>,
}
implement_encode_into_for_struct!(Interface, entity_info, bases, operations);

#[derive(Clone, Debug)]
pub struct Operation {
pub entity_info: EntityInfo,
pub is_idempotent: bool,
pub parameters: Vec<Field>,
pub has_streamed_parameter: bool,
pub return_type: Vec<Field>,
pub has_streamed_return: bool,
}
implement_encode_into_for_struct!(
Operation,
entity_info,
is_idempotent,
parameters,
has_streamed_parameter,
return_type,
has_streamed_return,
);

#[derive(Clone, Debug)]
pub struct Enum {
pub entity_info: EntityInfo,
pub is_compact: bool,
pub is_unchecked: bool,
pub underlying: Option<TypeId>,
pub enumerators: Vec<Enumerator>,
}
impl EncodeInto<Slice2> for &Enum {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.underlying.is_some())?;

// Encode the actual fields.
encoder.encode(&self.entity_info)?;
encoder.encode(self.is_compact)?;
encoder.encode(self.is_unchecked)?;
if let Some(underlying_value) = &self.underlying {
encoder.encode(underlying_value)?;
}
encoder.encode(&self.enumerators)?;
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Enumerator {
pub entity_info: EntityInfo,
pub value: Discriminant,
pub fields: Vec<Field>,
}
implement_encode_into_for_struct!(Enumerator, entity_info, value, fields);

#[derive(Clone, Debug)]
pub struct Discriminant {
pub absolute_value: u64,
pub is_positive: bool,
}
implement_encode_into_for_struct!(Discriminant, absolute_value, is_positive);

#[derive(Clone, Debug)]
pub struct CustomType {
pub entity_info: EntityInfo,
}
implement_encode_into_for_struct!(CustomType, entity_info);

#[derive(Clone, Debug)]
pub struct TypeAlias {
pub entity_info: EntityInfo,
pub underlying_type: TypeRef, // Can never be optional.
}
implement_encode_into_for_struct!(TypeAlias, entity_info, underlying_type);

#[derive(Clone, Debug)]
pub struct SequenceType {
pub element_type: TypeRef,
}
implement_encode_into_for_struct!(SequenceType, element_type);

#[derive(Clone, Debug)]
pub struct DictionaryType {
pub key_type: TypeRef, // Can never be optional.
pub value_type: TypeRef,
}
implement_encode_into_for_struct!(DictionaryType, key_type, value_type);

#[derive(Clone, Debug)]
pub struct ResultType {
pub success_type: TypeRef,
pub failure_type: TypeRef,
}
implement_encode_into_for_struct!(ResultType, success_type, failure_type);

#[derive(Clone, Debug)]
pub struct DocComment {
pub overview: Message,
pub see_tags: Vec<EntityId>,
}
implement_encode_into_for_struct!(DocComment, overview, see_tags);

#[repr(u8)]
#[derive(Clone, Debug)]
pub enum MessageComponent {
Text(String) = 0,
Link(EntityId) = 1,
}
impl EncodeInto<Slice2> for &MessageComponent {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Write the discriminant value.
// SAFETY: this cast is guaranteed to be safe because the enum is marked with `repr(u8)`, so it's safe to cast
// it directly to a `u8`.
unsafe {
let discriminant = *<*const _>::from(self).cast::<u8>();
encoder.encode_varint(discriminant)?;
}
Comment thread
InsertCreativityHere marked this conversation as resolved.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Review — Critical: Unsafe discriminant reading is unsound

For data-carrying #[repr(u8)] enums, Rust does not guarantee the discriminant is stored at byte offset 0. This unsafe pointer cast can be UB.

Safe alternative:

let discriminant: u8 = match self {
    MessageComponent::Text(_) => 0,
    MessageComponent::Link(_) => 1,
};
encoder.encode_varint(discriminant)?;

Same issue applies to Symbol::encode_into below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure Claude is just wrong here. Reading the first u8 of repr(u8) enum is explicitly guaranteed to be correct, and not UB, and never product 'incorrect encodings'.

This is the de facto way to read discriminants from discriminated unions in Rust.
See the reference on type-layouts for primitive backed enums:
https://doc.rust-lang.org/reference/type-layout.html#primitive-representation-of-enums-with-fields
And this post specifically about the safety of doing this:
https://users.rust-lang.org/t/enum-crimes-getting-the-discriminant/88103/4


// Encode the actual value.
match self {
MessageComponent::Text(v) => encoder.encode(v)?,
MessageComponent::Link(v) => encoder.encode(v)?,
Comment thread
InsertCreativityHere marked this conversation as resolved.
}

encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct SliceFile {
pub path: String,
pub module_declaration: Module,
pub attributes: Vec<Attribute>,
pub contents: Vec<Symbol>,
}
implement_encode_into_for_struct!(SliceFile, path, module_declaration, attributes, contents);

#[derive(Clone, Debug)]
pub struct GeneratedFile {
pub path: String,
pub contents: String,
}
implement_encode_into_for_struct!(GeneratedFile, path, contents);

#[repr(u8)]
#[derive(Clone, Debug)]
pub enum Symbol {
Interface(Interface) = 0,
Enum(Enum) = 1,
Struct(Struct) = 2,
CustomType(CustomType) = 3,
SequenceType(SequenceType) = 4,
DictionaryType(DictionaryType) = 5,
ResultType(ResultType) = 6, // TODO make result come before dictionary!
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to fix the ordering in the Slice files as well.
Long ago, we agreed the canonical order should be 'result', 'sequence', 'dictionary'. But I forgot this while writing the compiler definitions...

TypeAlias(TypeAlias) = 7,
}
impl EncodeInto<Slice2> for &Symbol {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Write the discriminant value.
// SAFETY: this cast is guaranteed to be safe because the enum is marked with `repr(u8)`, which means we know
// the first 'field' of this type's data layout must be a u8. This lets us read without offsetting the pointer.
unsafe {
let discriminant = *<*const _>::from(self).cast::<u8>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Review — Critical: Same unsound unsafe discriminant read as MessageComponent

Same fix — use a safe match:

let discriminant: u8 = match self {
    Symbol::Interface(_) => 0,
    Symbol::Enum(_) => 1,
    Symbol::Struct(_) => 2,
    Symbol::CustomType(_) => 3,
    Symbol::SequenceType(_) => 4,
    Symbol::DictionaryType(_) => 5,
    Symbol::ResultType(_) => 6,
    Symbol::TypeAlias(_) => 7,
};
encoder.encode_varint(discriminant)?;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably what we'll generate to be fair, but again, Claude is just wrong here.
See my above comment and links.

encoder.encode_varint(discriminant)?;
}
Comment thread
InsertCreativityHere marked this conversation as resolved.
Comment thread
InsertCreativityHere marked this conversation as resolved.

// Encode the actual value.
match self {
Symbol::Interface(v) => encoder.encode(v)?,
Symbol::Enum(v) => encoder.encode(v)?,
Symbol::Struct(v) => encoder.encode(v)?,
Symbol::CustomType(v) => encoder.encode(v)?,
Symbol::SequenceType(v) => encoder.encode(v)?,
Symbol::DictionaryType(v) => encoder.encode(v)?,
Symbol::ResultType(v) => encoder.encode(v)?,
Symbol::TypeAlias(v) => encoder.encode(v)?,
}

encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[derive(Clone, Debug)]
pub struct Diagnostic {
pub level: DiagnosticLevel,
pub message: String,
pub source: Option<String>,
}
impl EncodeInto<Slice2> for &Diagnostic {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
// Encode the bit-sequence. With only one optional, this is just a bool.
encoder.encode(self.source.is_some())?;

// Encode the actual fields.
encoder.encode(&self.level)?;
encoder.encode(&self.message)?;
if let Some(source_value) = &self.source {
encoder.encode(source_value)?;
}
encoder.encode_varint(TAG_END_MARKER)?;
Ok(())
}
}

#[repr(u8)]
#[derive(Clone, Copy, Debug)]
pub enum DiagnosticLevel {
Info,
Warning,
Error,
}
impl EncodeInto<Slice2> for &DiagnosticLevel {
fn encode_into(self, encoder: &mut Encoder<impl OutputTarget>) -> Result<()> {
encoder.encode(*self as u8)
}
}
Loading
Loading