Resolve AttributeError: 'ScalaFunction1' object has no attribute 'hashCode'.#184
Resolve AttributeError: 'ScalaFunction1' object has no attribute 'hashCode'.#184poolis wants to merge 3 commits into
Conversation
| check.isLessThanOrEqualTo("b", "d") | ||
| vrb.addCheck(check) | ||
| check.hasDataType("d", ConstrainableDataTypes.String, lambda x: x >= 1) | ||
| vrb.addCheck(check) |
There was a problem hiding this comment.
Why do we need to add and verify one by one?
There was a problem hiding this comment.
Because that is the use case for triggering the exception. If I change the test like so, it does not use hashCode.
vrb = VerificationSuite(self.spark) \
.onData(self.df)
check = Check(self.spark, CheckLevel.Error, "Enough checks to trigger a hashCode not an attribute of ScalaFunction1")
check.addConstraints([
check.isComplete('b'),
check.containsEmail('email'),
check.isGreaterThanOrEqualTo("d", "b"),
check.isLessThanOrEqualTo("b", "d"),
check.hasDataType("d", ConstrainableDataTypes.String, lambda x: x >= 1)])
result = vrb.addCheck(check).run()
There was a problem hiding this comment.
Does it only fail at the magic 5th one? It's a big strange if so.. btw CI is failing on this test
There was a problem hiding this comment.
I can't replicate the error in CI but submitted an attempt to fix it.
| @@ -37,6 +37,9 @@ def apply(self, arg): | |||
| """Implements the apply function""" | |||
| return self.lambda_function(arg) | |||
|
|
|||
There was a problem hiding this comment.
EDGE_CASE: Using hash() on a lambda function returns a value based on the lambda's identity (memory address), not its logical content. Two identical lambda expressions (e.g., lambda x: x > 10 defined twice) will produce different hash values, making hashCode() non-deterministic across equivalent functions. More critically, hash() in Python can return negative values or values outside Java's int range, and Integer.hashCode(int) on the JVM expects a Java int. If hash(self.lambda_function) returns a Python int larger than Integer.MAX_VALUE or smaller than Integer.MIN_VALUE, this could cause unexpected behavior or overflow when passed through py4j.
Line 39:
return self.gateway.jvm.java.lang.Integer.hashCode(hash(self.lambda_function)). Python'shash()returns a 64-bit integer on 64-bit platforms, butjava.lang.Integer.hashCode(int)expects a 32-bit int. The test at test_scala_utils.py line 28 (self.assertNotEqual(greaterThan10.hashCode(), notNoneTest.hashCode())) passes only because the two lambdas happen to have different object identities, not because of logical equivalence checking.
| check.hasDataType("d", ConstrainableDataTypes.String, lambda x: x >= 1) | ||
| vrb.addCheck(check) | ||
|
|
||
| result = vrb.run() |
There was a problem hiding this comment.
MISSING_TEST: The test never asserts anything about the result of vrb.run(). It only relies on the absence of an exception as the pass condition, but doesn't verify that the verification actually completed successfully (e.g., checking result.status or the check results). If run() silently fails or returns an error status, this test would still pass.
Line 1390:
result = vrb.run()— the variableresultis assigned but never used in any assertion. The docstring says 'Lack of Exception is passing' but a proper test should also assert the result is valid.
| self.assertEqual(df.select("constraint_status").collect(), [Row(constraint_status="Success"), Row(constraint_status="Success")]) No newline at end of file | ||
| self.assertEqual(df.select("constraint_status").collect(), [Row(constraint_status="Success"), Row(constraint_status="Success")]) | ||
|
|
||
| def test_hash_code(self): |
There was a problem hiding this comment.
DESIGN: The test repeatedly calls vrb.addCheck(check) with the same check object after mutating it in-place. This adds the same Check instance multiple times to the VerificationRunBuilder, which means the JVM receives the same underlying _Check Java object multiple times. This doesn't properly test the scenario described in issue #91 where multiple different checks with different ScalaFunction1 instances trigger the hashCode call. The test is fragile and may not reliably reproduce the original bug.
Lines 1378-1389:
check.isComplete('b')mutatescheck._Checkin place, thenvrb.addCheck(check)adds it. Thencheck.containsEmail('email')mutates the same object again, andvrb.addCheck(check)adds the same Python object (now pointing to a different_CheckJava object). The existing feedback confirms CI is failing on this test.
Issue #, if available:
#91
Description of changes:
Added
hashCode()method toScalaFunction1andScalaFunction2.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.