Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 30 additions & 30 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ struct Statistics {
*/
1: optional binary max;
2: optional binary min;
/**
/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using git blame -w does indeed correctly ignore whitespace differences:

git blame -w src/main/thrift/parquet.thrift
...
2c4ada8e src/thrift/parquet.thrift      (julien                2013-09-17 18:21:15 -0700  267) struct Statistics {
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  268)    /**
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  269)     * DEPRECATED: min and max value of the column. Use min_value and max_value.
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  270)     *
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  271)     * Values are encoded using PLAIN encoding, except that variable-length byte
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  272)     * arrays do not include a length prefix.
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  273)     *
bef54389 src/main/thrift/parquet.thrift (Zoltan Ivanfi         2017-10-06 16:38:53 -0700  274)     * These fields encode min and max values determined by signed comparison
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  275)     * only. New files should use the correct order for a column's logical type
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  276)     * and store the values in the min_value and max_value fields.
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  277)     *
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  278)     * To support older readers, these may be set when the column order is
bef54389 src/main/thrift/parquet.thrift (Zoltan Ivanfi         2017-10-06 16:38:53 -0700  279)     * signed.
041708da src/main/thrift/parquet.thrift (Ryan Blue             2017-04-17 11:23:41 -0700  280)     */
2c4ada8e src/thrift/parquet.thrift      (julien                2013-09-17 18:21:15 -0700  281)    1: optional binary max;
2c4ada8e src/thrift/parquet.thrift      (julien                2013-09-17 18:21:15 -0700  282)    2: optional binary min;
db687874 src/main/thrift/parquet.thrift (mwish                 2024-08-23 15:30:20 +0800  283)    /**
db687874 src/main/thrift/parquet.thrift (mwish                 2024-08-23 15:30:20 +0800  284)     * Count of null values in the column.

* Count of null values in the column.
*
* Writers SHOULD always write this field even if it is zero (i.e. no null value)
Expand Down Expand Up @@ -717,7 +717,7 @@ struct DictionaryPageHeader {
* The remaining section containing the data is compressed if is_compressed is true
*
* Implementation note - this header is not necessarily a strict improvement over
* `DataPageHeader` (in particular the original header might provide better compression
* `DataPageHeader` (in particular the original header might provide better compression
* in some scenarios). Page indexes require pages to start and end at row boundaries,
* regardless of which page header is used.
**/
Expand Down Expand Up @@ -897,7 +897,7 @@ struct ColumnMetaData {
/** total byte size of all uncompressed pages in this column chunk (including the headers) **/
6: required i64 total_uncompressed_size

/** total byte size of all compressed, and potentially encrypted, pages
/** total byte size of all compressed, and potentially encrypted, pages
* in this column chunk (including the headers) **/
7: required i64 total_compressed_size

Expand Down Expand Up @@ -964,19 +964,19 @@ struct ColumnChunk {
/** File where column data is stored. If not set, assumed to be same file as
* metadata. This path is relative to the current file.
*
* As of December 2025, the only known use-case for this field is writing summary
* parquet files (i.e. "_metadata" files). These files consolidate footers from
* multiple parquet files to allow for efficient reading of footers to avoid file
* As of December 2025, the only known use-case for this field is writing summary
* parquet files (i.e. "_metadata" files). These files consolidate footers from
* multiple parquet files to allow for efficient reading of footers to avoid file
* listing costs and prune out files that do not need to be read based on statistics.
*
* These files do not appear to have ever been formally specified in the specification.
* and are potentially problematic from a correctness perspective [1].
*
*
* [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d
*
* There is no other known usage of this field. Specifically, there are no known
* reference implementations that will read externally stored column data if this field is populated
* within a standard parquet file. Making use of the field for this purpose is
* There is no other known usage of this field. Specifically, there are no known
* reference implementations that will read externally stored column data if this field is populated
* within a standard parquet file. Making use of the field for this purpose is
* not considered part of the Parquet specification.
**/
1: optional string file_path
Expand Down Expand Up @@ -1039,10 +1039,10 @@ struct RowGroup {
* in this row group **/
5: optional i64 file_offset

/** Total byte size of all compressed (and potentially encrypted) column data
/** Total byte size of all compressed (and potentially encrypted) column data
* in this row group **/
6: optional i64 total_compressed_size

/** Row group ordinal in the file **/
7: optional i16 ordinal
}
Expand Down Expand Up @@ -1119,7 +1119,7 @@ union ColumnOrder {
* - If the min is +0, the row group may contain -0 values as well.
* - If the max is -0, the row group may contain +0 values as well.
* - When looking for NaN values, min and max should be ignored.
*
*
* When writing statistics the following rules should be followed:
* - NaNs should not be written to min or max statistics fields.
* - If the computed max value is zero (whether negative or positive),
Expand Down Expand Up @@ -1212,13 +1212,13 @@ struct ColumnIndex {
4: required BoundaryOrder boundary_order

/**
* A list containing the number of null values for each page
* A list containing the number of null values for each page
*
* Writers SHOULD always write this field even if no null values
* are present or the column is not nullable.
* Readers MUST distinguish between null_counts not being present
* Readers MUST distinguish between null_counts not being present
* and null_count being 0.
* If null_counts are not present, readers MUST NOT assume all
* If null_counts are not present, readers MUST NOT assume all
* null counts are 0.
*/
5: optional list<i64> null_counts
Expand Down Expand Up @@ -1275,12 +1275,12 @@ union EncryptionAlgorithm {
* Description for file metadata
*/
struct FileMetaData {
/** Version of this file
*
* As of December 2025, there is no agreed upon consensus of what constitutes
* version 2 of the file. For maximum compatibility with readers, writers should
* always populate "1" for version. For maximum compatibility with writers,
* readers should accept "1" and "2" interchangeably. All other versions are
/** Version of this file
*
* As of December 2025, there is no agreed upon consensus of what constitutes
* version 2 of the file. For maximum compatibility with readers, writers should
* always populate "1" for version. For maximum compatibility with writers,
* readers should accept "1" and "2" interchangeably. All other versions are
* reserved for potential future use-cases.
*/
1: required i32 version
Expand Down Expand Up @@ -1326,30 +1326,30 @@ struct FileMetaData {
*/
7: optional list<ColumnOrder> column_orders;

/**
/**
* Encryption algorithm. This field is set only in encrypted files
* with plaintext footer. Files with encrypted footer store algorithm id
* in FileCryptoMetaData structure.
*/
8: optional EncryptionAlgorithm encryption_algorithm

/**
* Retrieval metadata of key used for signing the footer.
* Used only in encrypted files with plaintext footer.
*/
/**
* Retrieval metadata of key used for signing the footer.
* Used only in encrypted files with plaintext footer.
*/
9: optional binary footer_signing_key_metadata
}

/** Crypto metadata for files with encrypted footer **/
struct FileCryptoMetaData {
/**
/**
* Encryption algorithm. This field is only used for files
* with encrypted footer. Files with plaintext footer store algorithm id
* inside footer (FileMetaData structure).
*/
1: required EncryptionAlgorithm encryption_algorithm
/** Retrieval metadata of key used for encryption of footer,

/** Retrieval metadata of key used for encryption of footer,
* and (possibly) columns **/
2: optional binary key_metadata
}