Skip to content

Added support for anomaly check config#45

Open
rahulgoyal2987 wants to merge 7 commits into
awslabs:masterfrom
rahulgoyal2987:feature/anomaly-check-config-support
Open

Added support for anomaly check config#45
rahulgoyal2987 wants to merge 7 commits into
awslabs:masterfrom
rahulgoyal2987:feature/anomaly-check-config-support

Conversation

@rahulgoyal2987
Copy link
Copy Markdown

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@gucciwang
Copy link
Copy Markdown
Contributor

@rahulgoyal2987 Thanks for contributing this and apologies for just getting to this -- can you add some unit tests for this functionality and update the PR?

@rahulgoyal2987
Copy link
Copy Markdown
Author

@rahulgoyal2987 Thanks for contributing this and apologies for just getting to this -- can you add some unit tests for this functionality and update the PR?

@gucciwang I have added test cases for anomaly check config

@rahulgoyal2987
Copy link
Copy Markdown
Author

@gucciwang Can please merge or reject this pull request?

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: db2249a9) — may not be fully accurate. Reply if this doesn't help.

Additional feedback:

tests/test_anomaly_detection.py:6BUG: The test file imports AnomalyCheckConfig but it is not listed in the import statements shown in the diff. The test at line 488 uses AnomalyCheckConfig(description='test error case', level=CheckLevel.Error) but the import for AnomalyCheckConfig from pydeequ.verification is missing from the test file's imports.

Looking at the existing imports in the test file (lines 7-18 in full_source_files), we see from pydeequ.verification import VerificationResult, VerificationSuite but AnomalyCheckConfig is not imported. The diff does not add this import. Also CheckLevel is used at line 488 but is not in the imports either (only from pydeequ.analyzers import MinLength, Size and anomaly detection imports are shown).

@@ -486,6 +487,20 @@ def get_anomalyDetector(self, anomaly):
def test_anomalyDetector(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MISSING_TEST: The new tests only cover SimpleThresholdStrategy with AnomalyCheckConfig. There is no test for passing anomalyCheckConfig to addAnomalyCheck with other anomaly strategies (e.g., AbsoluteChangeStrategy, RelativeRateOfChangeStrategy, etc.), and no test for the AnomalyCheckConfig class itself (e.g., verifying _get_java_object returns a valid JVM object, or testing with different parameters).

The diff adds test_SimpleThresholdStrategy_Error and test_SimpleThresholdStrategy_Warning but only the SimpleThresholdStrategy helper method was updated to accept anomalyCheckConfig (line 184). The other helper methods (AbsoluteChangeStrategy, RelativeRateOfChangeStrategy, OnlineNormalStrategy, BatchNormalStrategy, HoltWinters) still hardcode anomalyCheckConfig=None.

return df.select("check_status").collect()

def SimpleThresholdStrategy(self, df_prev, df_curr, analyzer_func, lowerBound, upperBound):
def SimpleThresholdStrategy(self, df_prev, df_curr, analyzer_func, lowerBound, upperBound,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESIGN: Only the SimpleThresholdStrategy helper method was updated to accept the anomalyCheckConfig parameter. The first call to addAnomalyCheck (line 189-190 for the previous run) still passes no config, meaning the test only exercises anomalyCheckConfig on the second/current run but not the first. This is inconsistent and doesn't fully test the feature.

Line 189-190: .addAnomalyCheck(SimpleThresholdStrategy(lowerBound, upperBound), analyzer_func).run() — no anomalyCheckConfig passed for the previous run. Line 197: .addAnomalyCheck(SimpleThresholdStrategy(lowerBound, upperBound), analyzer_func, anomalyCheckConfig) — config only passed for the current run.

Comment thread pydeequ/verification.py
# TODO integrate Analyzer context


class AnomalyCheckConfig:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MISSING_TEST: The AnomalyCheckConfig class has no dedicated unit test for its _get_java_object method. If the Deequ JVM class com.amazon.deequ.AnomalyCheckConfig doesn't exist or has a different constructor signature (e.g., different number of default arguments), this will fail at runtime with no clear error message.

Lines 14-28 in verification.py define AnomalyCheckConfig._get_java_object which calls getattr(self._check_java_class, 'apply$default$3')(), apply$default$4, and apply$default$5. There is no isolated test that verifies these JVM calls succeed independently of the full anomaly detection flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants