Skip to content

gh-146333: Fix quadratic regex backtracking in configparser option parsing#146399

Open
joshuaswanson wants to merge 2 commits intopython:mainfrom
joshuaswanson:fix/configparser-regex-backtracking
Open

gh-146333: Fix quadratic regex backtracking in configparser option parsing#146399
joshuaswanson wants to merge 2 commits intopython:mainfrom
joshuaswanson:fix/configparser-regex-backtracking

Conversation

@joshuaswanson
Copy link

@joshuaswanson joshuaswanson commented Mar 25, 2026

The _OPT_TMPL and _OPT_NV_TMPL regexes have quadratic backtracking when a line contains many spaces between non-delimiter characters. The lazy .*? in the option group and the \s* before the delimiter overlap on whitespace, so the engine tries every possible split point.

The fix removes \s* before the delimiter. This is safe because the option name is already stripped via .rstrip() in _handle_option (line 1160), and the value is stripped via .strip() (line 1169).

Before: x + 40000 spaces + y takes ~86 seconds
After: ~0.004 seconds

@joshuaswanson joshuaswanson requested a review from jaraco as a code owner March 25, 2026 00:09
@python-cla-bot
Copy link

python-cla-bot bot commented Mar 25, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

# Compiled regular expression for matching sections
SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE)
# Compiled regular expression for matching options with typical separators
OPTCRE = re.compile(_OPT_TMPL.format(delim="=|:"), re.VERBOSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe because the option name is already stripped via .rstrip() in _handle_option (line 1160), and the value is stripped via .strip() (line 1169).

The regexes are publicly exposed, this breaks it for people who use them directly.

@joshuaswanson joshuaswanson force-pushed the fix/configparser-regex-backtracking branch from fe7efda to 85407ee Compare March 25, 2026 12:06
@joshuaswanson
Copy link
Author

Good point, thanks. Updated to keep the regexes unchanged. Instead, _handle_option now checks for delimiter presence before matching. If no delimiter is found, it skips the regex entirely and either treats the line as a valueless option (when allow_no_value=True) or reports a parsing error. All 341 existing tests pass.

@encukou
Copy link
Member

encukou commented Mar 25, 2026

Overriding OPTCRE is mentioned (though discouraged) in the docs. IMO, that needs to remain usable for specifying different delimiters. We shouldn't skip it.

Would it work to add negative lookahead, (?!{delim}), to <option>?


@joshuaswanson, please don't force-push to CPython PR branches -- it makes the changes a little harder to follow for reviewers, and every PR gets squashed anyway.

@joshuaswanson
Copy link
Author

Won't force-push again, sorry about that.

The simple (?!{delim}) on .*? didn't eliminate the backtracking on its own because whitespace characters aren't delimiters, so .*? and \s* still overlapped on spaces. Took a bit more work to get it right.

The fix restructures the option group to (?:(?!{delim})\S)*(?:\s+(?:(?!{delim})\S)+)* which matches words separated by whitespace, where each word is non-delimiter non-space characters. Option can never have trailing whitespace, so there's no overlap with \s*. Captured groups are identical and all 341 existing tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants