Added support for anomaly check config#45
Conversation
|
@rahulgoyal2987 Thanks for contributing this and apologies for just getting to this -- can you add some unit tests for this functionality and update the PR? |
@gucciwang I have added test cases for anomaly check config |
|
@gucciwang Can please merge or reject this pull request? |
There was a problem hiding this comment.
Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: db2249a9) — may not be fully accurate. Reply if this doesn't help.
Additional feedback:
tests/test_anomaly_detection.py:6 — BUG: The test file imports AnomalyCheckConfig but it is not listed in the import statements shown in the diff. The test at line 488 uses AnomalyCheckConfig(description='test error case', level=CheckLevel.Error) but the import for AnomalyCheckConfig from pydeequ.verification is missing from the test file's imports.
Looking at the existing imports in the test file (lines 7-18 in full_source_files), we see
from pydeequ.verification import VerificationResult, VerificationSuitebutAnomalyCheckConfigis not imported. The diff does not add this import. AlsoCheckLevelis used at line 488 but is not in the imports either (onlyfrom pydeequ.analyzers import MinLength, Sizeand anomaly detection imports are shown).
| @@ -486,6 +487,20 @@ def get_anomalyDetector(self, anomaly): | |||
| def test_anomalyDetector(self): | |||
There was a problem hiding this comment.
MISSING_TEST: The new tests only cover SimpleThresholdStrategy with AnomalyCheckConfig. There is no test for passing anomalyCheckConfig to addAnomalyCheck with other anomaly strategies (e.g., AbsoluteChangeStrategy, RelativeRateOfChangeStrategy, etc.), and no test for the AnomalyCheckConfig class itself (e.g., verifying _get_java_object returns a valid JVM object, or testing with different parameters).
The diff adds test_SimpleThresholdStrategy_Error and test_SimpleThresholdStrategy_Warning but only the SimpleThresholdStrategy helper method was updated to accept anomalyCheckConfig (line 184). The other helper methods (AbsoluteChangeStrategy, RelativeRateOfChangeStrategy, OnlineNormalStrategy, BatchNormalStrategy, HoltWinters) still hardcode anomalyCheckConfig=None.
| return df.select("check_status").collect() | ||
|
|
||
| def SimpleThresholdStrategy(self, df_prev, df_curr, analyzer_func, lowerBound, upperBound): | ||
| def SimpleThresholdStrategy(self, df_prev, df_curr, analyzer_func, lowerBound, upperBound, |
There was a problem hiding this comment.
DESIGN: Only the SimpleThresholdStrategy helper method was updated to accept the anomalyCheckConfig parameter. The first call to addAnomalyCheck (line 189-190 for the previous run) still passes no config, meaning the test only exercises anomalyCheckConfig on the second/current run but not the first. This is inconsistent and doesn't fully test the feature.
Line 189-190:
.addAnomalyCheck(SimpleThresholdStrategy(lowerBound, upperBound), analyzer_func).run()— no anomalyCheckConfig passed for the previous run. Line 197:.addAnomalyCheck(SimpleThresholdStrategy(lowerBound, upperBound), analyzer_func, anomalyCheckConfig)— config only passed for the current run.
| # TODO integrate Analyzer context | ||
|
|
||
|
|
||
| class AnomalyCheckConfig: |
There was a problem hiding this comment.
MISSING_TEST: The AnomalyCheckConfig class has no dedicated unit test for its _get_java_object method. If the Deequ JVM class com.amazon.deequ.AnomalyCheckConfig doesn't exist or has a different constructor signature (e.g., different number of default arguments), this will fail at runtime with no clear error message.
Lines 14-28 in verification.py define
AnomalyCheckConfig._get_java_objectwhich callsgetattr(self._check_java_class, 'apply$default$3')(),apply$default$4, andapply$default$5. There is no isolated test that verifies these JVM calls succeed independently of the full anomaly detection flow.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.