-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Enable per column compression #3459
Copy link
Copy link
Open
Labels
Description
Describe the enhancement requested
Summary
- Add support for configuring compression codec and compression level on a per-column basis when writing Parquet files, rather than applying a single codec and level uniformly across all columns.
Proposed API
Programmatic (ParquetWriter / ParquetProperties):
ParquetWriter.builder(...)
.withCompressionCodec(CompressionCodecName.SNAPPY) // global default
.withCompressionCodec("col_a", CompressionCodecName.ZSTD) // per-column override
.withCompressionLevel("col_a", 9)
.build();
MapReduce (ParquetOutputFormat / Hadoop Configuration):
parquet.compression=SNAPPY
parquet.compression#col_a=ZSTD
parquet.compression.level#col_a=9
Behavior
- Columns without an override inherit the global codec and level.
- A compression level set for a column that doesn't support levels (e.g. SNAPPY) is silently ignored with a warning log.
- A compression level set without a per-column codec override applies the level to the default codec, with a warning log.
Component(s)
No response
Reactions are currently unavailable