[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic by Yicong-Huang · Pull Request #54996 · apache/spark

Yicong-Huang · 2026-03-25T01:22:14Z

What changes were proposed in this pull request?

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name	New Name
`PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS`	`OUTPUT_EXCEEDS_INPUT_ROWS`
`RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF`	`RESULT_ROWS_MISMATCH`
`STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF`	`INPUT_NOT_FULLY_CONSUMED`
`RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF`	`RESULT_COLUMN_SCHEMA_MISMATCH`
`RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF`	`RESULT_COLUMN_NAMES_MISMATCH`
`RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF`	`RESULT_COLUMN_NAMES_MISMATCH` (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of SPARK-55388.

Does this PR introduce any user-facing change?

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

How was this patch tested?

Existing tests updated to match new error condition names and messages.

Was this patch authored or co-authored using generative AI tooling?

No

zhengruifeng · 2026-03-25T02:21:29Z

            PythonException,
-            "PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF\\] "
-            "Column names of the returned pandas.DataFrame do not match "
+            "PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_NAMES\\] "


what about RESULT_COLUMN_NAMES_MISMATCH

thanks. changed to this one

zhengruifeng · 2026-03-25T02:22:03Z

    ]
  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
+  "RESULT_COLUMNS_MISMATCH_SCHEMA": {


what about RESULT_COLUMN_SCHEMA_MISMATCH

thanks. changed to this one

zhengruifeng

minor comments, otherwise LGTM

zhengruifeng · 2026-03-26T05:51:46Z

merged to master

…nostic Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388). Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references. Existing tests updated to match new error condition names and messages. No Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…pe-agnostic ### What changes were proposed in this pull request? Backport of #54996 to branch-4.1. Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). ### Why are the changes needed? These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests updated to match new error class names and messages. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55147 from Yicong-Huang/SPARK-55579-backport-4.1. Lead-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Co-authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…nostic Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388). Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references. Existing tests updated to match new error condition names and messages. No Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…pe-agnostic ### What changes were proposed in this pull request? Backport of #54996 to branch-4.0. Rename six PySpark error conditions to be generic and not tied to specific UDF eval types: | Old Name | New Name | |---|---| | `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` | | `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` | | `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` | | `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` | | `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) | Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data"). ### Why are the changes needed? These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests updated to match new error class names and messages. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55169 from Yicong-Huang/SPARK-55579-backport-4.0. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 46a6596 to a4919d0 Compare March 25, 2026 01:28

zhengruifeng reviewed Mar 25, 2026

View reviewed changes

Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from a4919d0 to 1fb783b Compare March 25, 2026 06:50

Yicong-Huang requested a review from zhengruifeng March 25, 2026 06:51

Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 1fb783b to 88119d2 Compare March 25, 2026 18:06

refactor: rename PySpark error classes to be eval-type-agnostic

4ef41e7

Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 88119d2 to 4ef41e7 Compare March 25, 2026 21:29

zhengruifeng approved these changes Mar 26, 2026

View reviewed changes

zhengruifeng closed this in 6fb96bc Mar 26, 2026

Yicong-Huang mentioned this pull request Apr 1, 2026

[SPARK-55579][PYTHON][4.1] Rename PySpark error classes to be eval-type-agnostic #55147

Closed

Yicong-Huang mentioned this pull request Apr 2, 2026

[SPARK-55579][PYTHON][4.0] Rename PySpark error classes to be eval-type-agnostic #55169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55579/rename-error-classes

Yicong-Huang commented Mar 25, 2026 •

edited

Loading

Uh oh!

zhengruifeng Mar 25, 2026

Uh oh!

Yicong-Huang Mar 25, 2026

Uh oh!

zhengruifeng Mar 25, 2026

Uh oh!

Yicong-Huang Mar 25, 2026

Uh oh!

zhengruifeng left a comment

Uh oh!

zhengruifeng commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yicong-Huang commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

zhengruifeng left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yicong-Huang commented Mar 25, 2026 •

edited

Loading