Skip to content

Commit 98ff6ee

Browse files
committed
Implement an advanced fuzzy diffing feature for interdiff
This implements a --fuzzy option to make interdiff perform a fuzzy comparison between two diffs. This is very helpful, for example, for comparing a backport patch to its upstream source patch to assist a human reviewer in verifying the correctness of the backport. The fuzzy diffing process is complex and works by: - Generating a new patch file with hunks split up into smaller hunks to separate out multiple deltas (+/- lines) in a single hunk that are spaced apart by context lines, increasing the amount of deltas that can be applied successfully with fuzz - Applying the rewritten p1 patch to p2's original file, and the rewritten p2 patch to p1's original file; the original files aren't ever merged - Relocating patched hunks in only p1's original file to align with their respective locations in the other file, based on the reported line offset printed out by `patch` for each hunk it successfully applied - Squashing unline gaps fewer than max_context*2 lines between hunks in the patched files, to hide unknown contextual information that is irrelevant for comparing the two diffs while also improving hunk alignment between the two patched files - Diffing the two patched files as usual - Rewriting the hunks in the diff output to exclude unlines from the unified diff, even splitting up hunks to remove unlines present in the middle of a hunk, while also adjusting the @@ line to compensate for the change in line offsets - Emitting the rewritten diff output while interleaving rejected hunks from both p1 and p2 in the output in order by line number, with a comment on the @@ line indicating when an emitted hunk is a rejected hunk This also involves working around some bugs in `patch` itself encountered along the way, such as occasionally inaccurate line offsets printed out and spurious fuzzing in certain cases that involve hunks with an unequal number of pre-context and post-context lines. The end result of all of this is a minimal set of real differences in the context lines of each hunk between the user's provided diffs. Even when fuzzing results in a faulty patch, the context differences are shown so there is never a risk of any real deltas getting hidden due to fuzzing. By default, the fuzz factor used is just the default used in `patch`. The fuzz factor can be adjusted by the user via appending =N to `--fuzzy` to specify the maximum number of context lines for `patch` to fuzz.
1 parent 430bbfc commit 98ff6ee

1 file changed

Lines changed: 859 additions & 29 deletions

File tree

0 commit comments

Comments
 (0)