Issue #235 - fix Invalid escape sequence#236
Conversation
| SPARK_VERSION = _get_spark_version() | ||
| DEEQU_MAVEN_COORD = _get_deequ_maven_config() | ||
| IS_DEEQU_V1 = re.search("com\.amazon\.deequ\:deequ\:1.*", DEEQU_MAVEN_COORD) is not None | ||
| IS_DEEQU_V1 = "com.amazon.deequ:deequ:1" in DEEQU_MAVEN_COORD |
There was a problem hiding this comment.
NIT: The new substring check "com.amazon.deequ:deequ:1" in DEEQU_MAVEN_COORD is less precise than the original regex intent. It would match a hypothetical version like com.amazon.deequ:deequ:12.0.0-spark-3.5 (a future major version starting with '1' but not actually v1.x). A more robust fix for the deprecation warning would be to either use a raw string with the original regex (re.search(r"com\.amazon\.deequ:deequ:1\.", DEEQU_MAVEN_COORD)) or check for ":deequ:1." (with trailing dot) to ensure it's actually version 1.x.
Line 44:
IS_DEEQU_V1 = "com.amazon.deequ:deequ:1" in DEEQU_MAVEN_COORD— this matches any string containing the substring, including potential future versions likedeequ:10.xordeequ:12.x. The original regexcom\.amazon\.deequ\:deequ\:1.*was anchored to version strings starting with '1'. Looking at SPARK_TO_DEEQU_COORD_MAPPING (lines 7-11), current values are alldeequ:2.0.8-spark-*, so this is not a current bug, but it's a correctness regression for future-proofing.
Issue #235
Changed
re.searchinto a built-ininsearch.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.