Describe
Iceberg scans currently fall back when FileScanTask.residual() is not alwaysTrue, even in cases where the scan could still run natively and the remaining predicates could be handled safely after the scan.
This is overly conservative and reduces native coverage for Iceberg reads with residual/data filters.
Describe the solution you'd like
Support native Iceberg scans with residual filters by splitting predicate handling into two parts:
- push a supported subset of Iceberg filter expressions into native scan pruning predicates
- keep unsupported predicates on the existing post-scan
NativeFilter path
Concretely, this would include:
- removing the unconditional fallback for non-
alwaysTrue residual filters
- extending
IcebergScanPlan to carry pruningPredicates
- converting supported Iceberg filter expressions into Spark expressions, then into native scan pruning predicates
- passing those predicates down through
NativeIcebergTableScanExec
- preserving correctness by evaluating unsupported predicates above the scan
Additional context
This should be an incremental improvement to Iceberg native execution, not full Iceberg feature parity.
A reasonable initial supported subset for scan pruning includes:
AND
OR
NOT
IS NULL
IS NOT NULL
IS NAN
NOT NAN
- comparison predicates such as
=, !=, <, <=, >, >=
IN
NOT IN
Some types and scenarios can remain out of scope for scan pruning initially, such as:
StringType
BinaryType
DecimalType
- metadata columns
- delete files
- changelog scans
- mixed file formats
Describe
Iceberg scans currently fall back when
FileScanTask.residual()is notalwaysTrue, even in cases where the scan could still run natively and the remaining predicates could be handled safely after the scan.This is overly conservative and reduces native coverage for Iceberg reads with residual/data filters.
Describe the solution you'd like
Support native Iceberg scans with residual filters by splitting predicate handling into two parts:
NativeFilterpathConcretely, this would include:
alwaysTrueresidual filtersIcebergScanPlanto carrypruningPredicatesNativeIcebergTableScanExecAdditional context
This should be an incremental improvement to Iceberg native execution, not full Iceberg feature parity.
A reasonable initial supported subset for scan pruning includes:
ANDORNOTIS NULLIS NOT NULLIS NANNOT NAN=,!=,<,<=,>,>=INNOT INSome types and scenarios can remain out of scope for scan pruning initially, such as:
StringTypeBinaryTypeDecimalType