Skip to content

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996

Closed
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55579/rename-error-classes
Closed

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55579/rename-error-classes

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented Mar 25, 2026

What changes were proposed in this pull request?

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name New Name
PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS OUTPUT_EXCEEDS_INPUT_ROWS
RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF RESULT_ROWS_MISMATCH
STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF INPUT_NOT_FULLY_CONSUMED
RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_SCHEMA_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_NAMES_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF RESULT_COLUMN_NAMES_MISMATCH (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of SPARK-55388.

Does this PR introduce any user-facing change?

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

How was this patch tested?

Existing tests updated to match new error condition names and messages.

Was this patch authored or co-authored using generative AI tooling?

No

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 46a6596 to a4919d0 Compare March 25, 2026 01:28
PythonException,
"PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF\\] "
"Column names of the returned pandas.DataFrame do not match "
"PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_NAMES\\] "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about RESULT_COLUMN_NAMES_MISMATCH

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed to this one

]
},
"RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
"RESULT_COLUMNS_MISMATCH_SCHEMA": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about RESULT_COLUMN_SCHEMA_MISMATCH

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed to this one

Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, otherwise LGTM

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from a4919d0 to 1fb783b Compare March 25, 2026 06:50
@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 1fb783b to 88119d2 Compare March 25, 2026 18:06
@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 88119d2 to 4ef41e7 Compare March 25, 2026 21:29
@zhengruifeng
Copy link
Copy Markdown
Contributor

merged to master

Yicong-Huang added a commit to Yicong-Huang/spark that referenced this pull request Apr 1, 2026
…nostic

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388).

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

Existing tests updated to match new error condition names and messages.

No

Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
zhengruifeng pushed a commit that referenced this pull request Apr 2, 2026
…pe-agnostic

### What changes were proposed in this pull request?

Backport of #54996 to branch-4.1.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

### Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests updated to match new error class names and messages.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55147 from Yicong-Huang/SPARK-55579-backport-4.1.

Lead-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Co-authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Yicong-Huang added a commit to Yicong-Huang/spark that referenced this pull request Apr 2, 2026
…nostic

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388).

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

Existing tests updated to match new error condition names and messages.

No

Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Yicong-Huang added a commit to Yicong-Huang/spark that referenced this pull request Apr 2, 2026
…nostic

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of [SPARK-55388](https://issues.apache.org/jira/browse/SPARK-55388).

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

Existing tests updated to match new error condition names and messages.

No

Closes apache#54996 from Yicong-Huang/SPARK-55579/rename-error-classes.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
zhengruifeng pushed a commit that referenced this pull request Apr 3, 2026
…pe-agnostic

### What changes were proposed in this pull request?

Backport of #54996 to branch-4.0.

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

| Old Name | New Name |
|---|---|
| `PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS` | `OUTPUT_EXCEEDS_INPUT_ROWS` |
| `RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF` | `RESULT_ROWS_MISMATCH` |
| `STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF` | `INPUT_NOT_FULLY_CONSUMED` |
| `RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_SCHEMA_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` |
| `RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF` | `RESULT_COLUMN_NAMES_MISMATCH` (merged) |

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

### Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests updated to match new error class names and messages.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55169 from Yicong-Huang/SPARK-55579-backport-4.0.

Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants