apache · rok · May 15, 2024 · May 15, 2024 · Jun 4, 2024 · Jun 5, 2024
diff --git a/LogicalTypes.md b/LogicalTypes.md
@@ -256,6 +256,54 @@ The primitive type is a 2-byte `FIXED_LEN_BYTE_ARRAY`.
 
 The sort order for `FLOAT16` is signed (with special handling of NANs and signed zeros); it uses the same [logic](https://github.com/apache/parquet-format#sort-order) as `FLOAT` and `DOUBLE`.
 
+### FIXED_SIZE_LIST
+
+The `FIXED_SIZE_LIST` annotation represents a fixed-size sequence of elements
+of the same non-array primitive physical type. It must annotate a
+`FIXED_LEN_BYTE_ARRAY` primitive type and uses the `FixedSizeListType`
+parameters:
+* `type`: the primitive physical type of each element
+* `num_values`: the number of elements in each list value
+
+`num_values` must be a positive integer. `type` must be a fixed-width primitive
+physical type and must not be `BOOLEAN`, `INT96`, or `BYTE_ARRAY`.
+Writers must not emit `FIXED_SIZE_LIST` metadata that violates these
+constraints. Readers must treat violating metadata as invalid.
+
+The annotated field's `type_length` must equal the encoded size, in bytes, of
+`num_values` elements using the PLAIN representation of `type`:
+* `INT32` or `FLOAT`: `type_length` = `num_values` * 4
+* `INT64` or `DOUBLE`: `type_length` = `num_values` * 8
+* `FIXED_LEN_BYTE_ARRAY`: `type_length` must be divisible by `num_values`;
+  each element is `type_length` / `num_values` bytes
+Writers must not emit `FIXED_SIZE_LIST` metadata where `type_length` does not
+match these rules. Readers must treat violating metadata as invalid.
+
+For example, a column of 128-element float vectors:
+
+    optional fixed_len_byte_array(512) embeddings (FIXED_SIZE_LIST(type=FLOAT, num_values=128));
+
+Each `FIXED_LEN_BYTE_ARRAY` value stores one fixed-size list value.
+`FIXED_SIZE_LIST` is intentionally represented as a primitive leaf and does not
+use the 3-level `LIST` structure.
+
+This annotation defines only the intra-value element layout. The surrounding
+column encoding is unchanged: any encoding that supports
+`FIXED_LEN_BYTE_ARRAY` may be used. Note that `BYTE_STREAM_SPLIT` operates at
+the full `type_length` width, creating `type_length` byte-streams; this
+naturally groups corresponding element bytes across rows.
+
+If the annotated field is `optional`, list values may be null. Individual
+elements are always non-null and are not represented with their own definition
+or repetition levels. Nested element types are not supported by
+`FIXED_SIZE_LIST`; use `LIST` for nested or element-nullable data.
+
+Writers must validate that `type` is not `BOOLEAN`, `INT96`, or `BYTE_ARRAY`
+and that `type_length` matches the expected size for the given `type` and
+`num_values`. Readers should reject files that violate these constraints.
+
+The sort order used for `FIXED_SIZE_LIST` is undefined.
+
 ## Temporal Types
 
 ### DATE

diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
@@ -319,6 +319,11 @@ struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BYTE_ARRAY, must be encoded with UTF-8
 struct DateType {}    // allowed for INT32
 struct Float16Type {} // allowed for FIXED[2], must be encoded as raw FLOAT16 bytes (see LogicalTypes.md)
+struct FixedSizeListType {        // allowed for FIXED_LEN_BYTE_ARRAY; see LogicalTypes.md
+    1: required Type type;        // element type (fixed-width primitive; must not be BOOLEAN, INT96, or BYTE_ARRAY)
+    2: required i32 num_values;   // number of elements in each value; must be > 0
+    // Writers must not emit violating values. Readers must treat violating metadata as invalid.
+}
 
 /**
  * Logical type to annotate a column that is always null.
@@ -485,15 +490,16 @@ union LogicalType {
   8:  TimestampType TIMESTAMP
 
   // 9: reserved for INTERVAL
-  10: IntType INTEGER         // use ConvertedType INT_* or UINT_*
-  11: NullType UNKNOWN        // no compatible ConvertedType
-  12: JsonType JSON           // use ConvertedType JSON
-  13: BsonType BSON           // use ConvertedType BSON
-  14: UUIDType UUID           // no compatible ConvertedType
-  15: Float16Type FLOAT16     // no compatible ConvertedType
-  16: VariantType VARIANT     // no compatible ConvertedType
-  17: GeometryType GEOMETRY   // no compatible ConvertedType
-  18: GeographyType GEOGRAPHY // no compatible ConvertedType
+  10: IntType INTEGER                          // use ConvertedType INT_* or UINT_*
+  11: NullType UNKNOWN                         // no compatible ConvertedType
+  12: JsonType JSON                            // use ConvertedType JSON
+  13: BsonType BSON                            // use ConvertedType BSON
+  14: UUIDType UUID                            // no compatible ConvertedType
+  15: Float16Type FLOAT16                      // no compatible ConvertedType
+  16: VariantType VARIANT                      // no compatible ConvertedType
+  17: GeometryType GEOMETRY                    // no compatible ConvertedType
+  18: GeographyType GEOGRAPHY                  // no compatible ConvertedType
+  19: FixedSizeListType FIXED_SIZE_LIST        // no compatible ConvertedType
 }
 
 /**