Skip to content

Improve native Iceberg scan coverage by pushing supported residual filters into scan pruning #2163

@weimingdiit

Description

@weimingdiit

Describe
Iceberg scans currently fall back when FileScanTask.residual() is not alwaysTrue, even in cases where the scan could still run natively and the remaining predicates could be handled safely after the scan.

This is overly conservative and reduces native coverage for Iceberg reads with residual/data filters.

Describe the solution you'd like
Support native Iceberg scans with residual filters by splitting predicate handling into two parts:

  • push a supported subset of Iceberg filter expressions into native scan pruning predicates
  • keep unsupported predicates on the existing post-scan NativeFilter path

Concretely, this would include:

  • removing the unconditional fallback for non-alwaysTrue residual filters
  • extending IcebergScanPlan to carry pruningPredicates
  • converting supported Iceberg filter expressions into Spark expressions, then into native scan pruning predicates
  • passing those predicates down through NativeIcebergTableScanExec
  • preserving correctness by evaluating unsupported predicates above the scan

Additional context
This should be an incremental improvement to Iceberg native execution, not full Iceberg feature parity.

A reasonable initial supported subset for scan pruning includes:

  • AND
  • OR
  • NOT
  • IS NULL
  • IS NOT NULL
  • IS NAN
  • NOT NAN
  • comparison predicates such as =, !=, <, <=, >, >=
  • IN
  • NOT IN

Some types and scenarios can remain out of scope for scan pruning initially, such as:

  • StringType
  • BinaryType
  • DecimalType
  • metadata columns
  • delete files
  • changelog scans
  • mixed file formats

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions