Summary
Recently there was a PR which added a schema module to the Rust SDK (PR #200) that builds a protobuf
DescriptorProto at runtime from a Unity Catalog table schema, eliminating the need for offline .proto file generation.
We should port this capability to all other SDK languages:
Motivation
Today, SDK users must run generate_proto as a build step before they can stream data. With dynamic descriptor generation, a caller can query the UC REST API for table metadata and immediately start streaming — no code-gen step required.
Behavior to Implement
Each language SDK should expose:
- A
UcColumn type (mirrors UC REST API: name, type_name, type_text, type_json, nullable, position)
- A
UcTableSchema type (schema_name + table_name + columns)
descriptor_from_uc_columns(columns, message_name) → proto descriptor
descriptor_from_uc_schema(schema) → proto descriptor
The type mapping and encoding contracts (e.g. DATE = int32 days-since-epoch, TIMESTAMP = int64 microseconds UTC) are documented in the Rust implementation: rust/sdk/src/schema.rs.
Reference Implementation
Rust SDK: rust/sdk/src/schema.rs (added in PR #200 / commit 1ebda6e)
Notes
STRUCT, ARRAY, and MAP columns require type_json to be populated (the JSON from the UC REST API /api/2.1/unity-catalog/tables/{name} response)
- Simple scalar columns only need
type_name
- Max nesting depth should be capped (Rust uses 100) to prevent stack overflow on pathological inputs
Summary
Recently there was a PR which added a
schemamodule to the Rust SDK (PR #200) that builds a protobufDescriptorProtoat runtime from a Unity Catalog table schema, eliminating the need for offline.protofile generation.We should port this capability to all other SDK languages:
Motivation
Today, SDK users must run
generate_protoas a build step before they can stream data. With dynamic descriptor generation, a caller can query the UC REST API for table metadata and immediately start streaming — no code-gen step required.Behavior to Implement
Each language SDK should expose:
UcColumntype (mirrors UC REST API:name,type_name,type_text,type_json,nullable,position)UcTableSchematype (schema_name + table_name + columns)descriptor_from_uc_columns(columns, message_name)→ proto descriptordescriptor_from_uc_schema(schema)→ proto descriptorThe type mapping and encoding contracts (e.g. DATE = int32 days-since-epoch, TIMESTAMP = int64 microseconds UTC) are documented in the Rust implementation:
rust/sdk/src/schema.rs.Reference Implementation
Rust SDK:
rust/sdk/src/schema.rs(added in PR #200 / commit1ebda6e)Notes
STRUCT,ARRAY, andMAPcolumns requiretype_jsonto be populated (the JSON from the UC REST API/api/2.1/unity-catalog/tables/{name}response)type_name