Skip to content

[All SDKs] Add dynamic proto descriptor generation from UC schema #240

@elenagaljak-db

Description

@elenagaljak-db

Summary

Recently there was a PR which added a schema module to the Rust SDK (PR #200) that builds a protobuf
DescriptorProto at runtime from a Unity Catalog table schema, eliminating the need for offline .proto file generation.

We should port this capability to all other SDK languages:

  • Python
  • TypeScript
  • Java
  • Go

Motivation

Today, SDK users must run generate_proto as a build step before they can stream data. With dynamic descriptor generation, a caller can query the UC REST API for table metadata and immediately start streaming — no code-gen step required.

Behavior to Implement

Each language SDK should expose:

  • A UcColumn type (mirrors UC REST API: name, type_name, type_text, type_json, nullable, position)
  • A UcTableSchema type (schema_name + table_name + columns)
  • descriptor_from_uc_columns(columns, message_name) → proto descriptor
  • descriptor_from_uc_schema(schema) → proto descriptor

The type mapping and encoding contracts (e.g. DATE = int32 days-since-epoch, TIMESTAMP = int64 microseconds UTC) are documented in the Rust implementation: rust/sdk/src/schema.rs.

Reference Implementation

Rust SDK: rust/sdk/src/schema.rs (added in PR #200 / commit 1ebda6e)

Notes

  • STRUCT, ARRAY, and MAP columns require type_json to be populated (the JSON from the UC REST API /api/2.1/unity-catalog/tables/{name} response)
  • Simple scalar columns only need type_name
  • Max nesting depth should be capped (Rust uses 100) to prevent stack overflow on pathological inputs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions