Handle exception for invalid column names containing . by sidharthbolar · Pull Request #102 · awslabs/python-deequ

sidharthbolar · 2022-08-29T16:19:41Z

Issue #, if available:
#99
Description of changes:
Columns with dot in their name are throwing an Analysis Exception
Based on analysis done , the exception is being thrown by the core scala deeque library
Have proposed to handle this exception to notify the caller that a column name with "." should not be passed
the exception handled is added to all check methods and implemented via a static method

Local Tests were passing and have added an additional UT test_invalidColumnException to verify the behaviour

*version = "1.0.1"
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

herminio-iovio · 2022-09-07T17:36:45Z

+        """
+        :param column: column to which constraint is to be applied
+        """
+        if "." in column:


I think this is quite strict, perhaps simply enclosing the field with ticks will do, like f"`column.name`"

@herminio-iovio passing through ticks also does not work and returns the same error

chenliu0831

Use dot with column name in Spark/SQL is quite legitimate, basically to fetch values from fields that are nested. Deequ/PyDeequ as a library does not seem to be the right place to handle this since it's hard to guess customer's intent. For example, this might break some existing use-cases where people actually want to profile/check nested column types.

I think it's up to the caller or other higher level library / service to make the call (e.g. escape with "`" or rename)

sidharthbolar · 2022-09-16T08:23:46Z

hi @chenliu0831 thanks for the review
In that case this PR handles this exception and notifies the caller to not pass a column with a "." so that he can handle it outside this library

chenliu0831 · 2022-09-16T13:22:43Z

hi @chenliu0831 thanks for the review In that case this PR handles this exception and notifies the caller to not pass a column with a "." so that he can handle it outside this library

@sidharthbolar right I think I understand the purpose, but column with dot is a valid usage for structured column (JSON type). Do you mind add a test to verify if that works?

Looks like you actually want a better Pythonic error (than the vanilla Spark error) when such column with dot does not exist? If so, how about add a helper to check column existence instead?

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: db2249a9) — may not be fully accurate. Reply if this doesn't help.

github-actions · 2026-05-18T17:17:07Z

+    @staticmethod
+    def _handle_invalid_column(column):
+        """
+        :param column: column to which constraint is to be applied


BUG: This validation rejects legitimate column names. In Spark, dots in column names are valid (e.g., nested struct fields like address.city, or columns named with backtick-escaping). The PR blocks all columns containing dots, which is overly restrictive and breaks valid use cases. As noted in existing feedback, this should instead check column existence or provide a better error message rather than blanket-rejecting dots.

Line 105: if "." in column: raises ValueError for any column containing a dot. Spark supports nested column access via dots (e.g., struct fields) and backtick-escaped column names with dots. The existing feedback from @chenliu0831 confirms: 'column with dot is a valid usage for structured column (JSON type)'.

github-actions · 2026-05-18T17:17:07Z

@@ -160,6 +168,7 @@ def hasCompleteness(self, column, assertion, hint=None):
        """


BUG: A debug print(self) statement was left in the hasCompleteness method. This will produce unwanted output in production usage and is inconsistent with all other check methods.

Line 169 in the full source: print(self) appears after self._Check = self._Check.hasCompleteness(column, assertion_func, hint). No other method in the class has a print statement.

github-actions · 2026-05-18T17:17:07Z

@@ -144,6 +151,7 @@ def isComplete(self, column, hint=None):
        :return: isComplete self:A Check.scala object that asserts on a column completion.
        """


BUG: The isComplete method signature was changed to remove the third argument self._jvm.scala.Option.apply(None) that was previously passed to the Scala method. This will cause a Py4J error at runtime if the underlying Scala isComplete method requires 3 arguments (column, hint, analyzerOption).

In the knowledge base, the original code at line 153 was: self._Check = self._Check.isComplete(column, hint, self._jvm.scala.Option.apply(None)). The diff shows it changed to just self._Check = self._Check.isComplete(column, hint) (the third argument is removed).

github-actions · 2026-05-18T17:17:07Z

@@ -160,6 +168,7 @@ def hasCompleteness(self, column, assertion, hint=None):
        """
        assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)


BUG: The hasCompleteness method signature was changed to remove the fourth argument self._jvm.scala.Option.apply(None). This will cause a Py4J error at runtime if the underlying Scala hasCompleteness method requires 4 arguments.

In the knowledge base, the original code was: self._Check = self._Check.hasCompleteness(column, assertion_func, hint, self._jvm.scala.Option.apply(None)). The diff shows it changed to self._Check = self._Check.hasCompleteness(column, assertion_func, hint) (fourth argument removed).

github-actions · 2026-05-18T17:17:07Z

@@ -225,6 +238,7 @@ def isUnique(self, column, hint=None):
        :return: isUnique self: A Check.scala object that asserts uniqueness in the column.
        """


BUG: The isUnique method signature was changed to remove the third argument self._jvm.scala.Option.apply(None). This will cause a Py4J error at runtime if the underlying Scala isUnique method requires 3 arguments.

In the knowledge base, the original code was: self._Check = self._Check.isUnique(column, hint, self._jvm.scala.Option.apply(None)). The diff shows it changed to self._Check = self._Check.isUnique(column, hint) (third argument removed).

github-actions · 2026-05-18T17:17:07Z

@@ -785,6 +834,7 @@ def isContainedIn(self, column, allowed_values):
        arr = self._spark_session.sparkContext._gateway.new_array(self._jvm.java.lang.String, len(allowed_values))


BUG: The isContainedIn method signature was changed to remove the assertion and hint parameters entirely. The method no longer accepts or passes assertion/hint to the Scala method, breaking the existing public API that tests rely on (e.g., self.isContainedIn("a", ["foo", "bar", "baz"], lambda x: x == 1, hint="it should be 1")).

The full source shows def isContainedIn(self, column, allowed_values): at line 826 with no assertion or hint parameters. However, the test file at lines 882-889 calls self.isContainedIn("a", ["foo", "bar", "baz"], lambda x: x == 1) and self.isContainedIn("a", ["foo", "bar", "baz"], lambda x: x == 1, hint="it should be 1") which would fail with TypeError.

github-actions · 2026-05-18T17:17:07Z

@@ -287,6 +303,7 @@ def hasUniqueValueRatio(self, columns, assertion, hint=None):
        """
        assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)
        hint = self._jvm.scala.Option.apply(hint)


BUG: The hasUniqueValueRatio method was changed to remove the fourth argument self._jvm.scala.Option.apply(None). This will cause a Py4J error at runtime if the underlying Scala method requires it.

In the knowledge base, the original code was: self._Check = self._Check.hasUniqueValueRatio(columns_seq, assertion_func, hint, self._jvm.scala.Option.apply(None)). The diff shows it changed to self._Check = self._Check.hasUniqueValueRatio(columns_seq, assertion_func, hint).

github-actions · 2026-05-18T17:17:07Z

@@ -409,6 +433,7 @@ def hasMinLength(self, column, assertion, hint=None):
        """
        assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)
        hint = self._jvm.scala.Option.apply(hint)


BUG: The hasMinLength method was changed to remove the fourth argument self._jvm.scala.Option.apply(None). This will cause a Py4J error at runtime if the underlying Scala method requires it.

In the knowledge base, the original code was: self._Check = self._Check.hasMinLength(column, assertion_func, hint, self._jvm.scala.Option.apply(None)). The diff shows it changed to self._Check = self._Check.hasMinLength(column, assertion_func, hint).

github-actions · 2026-05-18T17:17:07Z

@@ -141,6 +148,7 @@ def hasDataType(self, column, datatype, assertion=None, hint=None):
        )



BUG: A debug df.show(truncate=False) statement was left in the test helper method hasDataType. This will produce unwanted console output during test runs.

Line 149: df.show(truncate=False) in the hasDataType helper method.

github-actions · 2026-05-18T17:17:07Z

-            self.hasDataType("boolean", ConstrainableDataTypes.Boolean), [Row(constraint_status="Success")]
-        )
+        self.assertEqual(self.hasDataType("boolean", ConstrainableDataTypes.Boolean), [Row(constraint_status="Success")])
+


MISSING_TEST: The test test_invalidColumnException only tests hasDataType with a dotted column. There are no tests verifying the new _handle_invalid_column validation for any of the other ~30 methods where it was added.

Line 597-599: Only one test exists: self.hasDataType("column.with.dot", ConstrainableDataTypes.String). No tests for isComplete, hasCompleteness, areComplete, isUnique, etc. with dotted columns.

Handle exception for invalid column names containing .

bd07c28

herminio-iovio reviewed Sep 7, 2022

View reviewed changes

chenliu0831 reviewed Sep 16, 2022

View reviewed changes

github-actions Bot reviewed May 18, 2026

View reviewed changes

		@@ -160,6 +168,7 @@ def hasCompleteness(self, column, assertion, hint=None):
		"""

		@@ -144,6 +151,7 @@ def isComplete(self, column, hint=None):
		:return: isComplete self:A Check.scala object that asserts on a column completion.
		"""

		@@ -160,6 +168,7 @@ def hasCompleteness(self, column, assertion, hint=None):
		"""
		assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)

		@@ -225,6 +238,7 @@ def isUnique(self, column, hint=None):
		:return: isUnique self: A Check.scala object that asserts uniqueness in the column.
		"""

		@@ -785,6 +834,7 @@ def isContainedIn(self, column, allowed_values):
		arr = self._spark_session.sparkContext._gateway.new_array(self._jvm.java.lang.String, len(allowed_values))

		@@ -141,6 +148,7 @@ def hasDataType(self, column, datatype, assertion=None, hint=None):
		)

Conversation

sidharthbolar commented Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

herminio-iovio Sep 7, 2022

Choose a reason for hiding this comment

Uh oh!

sidharthbolar Sep 16, 2022

Choose a reason for hiding this comment

Uh oh!

chenliu0831 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sidharthbolar commented Sep 16, 2022

Uh oh!

chenliu0831 commented Sep 16, 2022

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sidharthbolar commented Aug 29, 2022 •

edited

Loading

chenliu0831 left a comment •

edited

Loading