Skip to content

chore: Remove redundant parquet.enable.dictionary ConfigMatrix from SQL tests#3866

Merged
andygrove merged 2 commits intoapache:mainfrom
andygrove:remove-dictionary-config-matrix
Apr 14, 2026
Merged

chore: Remove redundant parquet.enable.dictionary ConfigMatrix from SQL tests#3866
andygrove merged 2 commits intoapache:mainfrom
andygrove:remove-dictionary-config-matrix

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A - minor cleanup

Rationale for this change

The -- ConfigMatrix: parquet.enable.dictionary=false,true directive was added as boilerplate to all 136 SQL expression test files. This causes each test to run twice (once with dictionary encoding enabled, once disabled), but is redundant because the tests generally do not create tables with duplicate string data that would actually be dictionary-encoded by Parquet. Removing it halves the number of SQL file test cases without reducing coverage.

What changes are included in this PR?

  • Remove the -- ConfigMatrix: parquet.enable.dictionary=false,true directive from all 136 SQL test files
  • Update the contributor guide (sql-file-tests.md) to stop recommending this directive for new tests and use a more meaningful ConfigMatrix example
  • Update the expression guide (adding_a_new_expression.md) to remove the directive from the example SQL test file
  • Update the PR review skill to match

How are these changes tested?

The SQL file tests still run, just without the redundant dictionary encoding matrix. Existing test coverage is preserved since the dictionary encoding setting has no meaningful effect on these tests.

…QL tests

The ConfigMatrix directive for parquet.enable.dictionary was added as
boilerplate to all SQL test files but is redundant since the tests
generally do not create tables with duplicate string data that would
be dictionary-encoded. Also update contributor guides to stop
recommending this directive for new tests.
@andygrove andygrove changed the title Remove redundant parquet.enable.dictionary ConfigMatrix from SQL tests chore: Remove redundant parquet.enable.dictionary ConfigMatrix from SQL tests Apr 1, 2026
@kazuyukitanimura
Copy link
Copy Markdown
Contributor

kazuyukitanimura commented Apr 1, 2026

Not against for this change, but do we plan to add dictionary encoded parquet tests for the sql tests in the future?

@andygrove
Copy link
Copy Markdown
Member Author

Not against for this change, but do we plan to add dictionary encoded parquet tests for the sql tests in the future?

I don't think it will make sense to use the sql test file approach to test for dictionary encoded parquet because we would have to insert large amounts of data to trigger dictionary encoding. I think it is better to rely on the Scala tests for dictionary encoding tests.

Copy link
Copy Markdown
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

-- KIND, either express or implied. See the License for the
-- specific language governing permissions and limitations
-- under the License.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary blank line

@andygrove andygrove merged commit b05e267 into apache:main Apr 14, 2026
132 checks passed
@andygrove andygrove deleted the remove-dictionary-config-matrix branch April 14, 2026 14:22
@andygrove
Copy link
Copy Markdown
Member Author

Merged. Thanks @parthchandra @kazuyukitanimura. This will save a little time in our CI runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants