Commit 98ff6ee
committed
Implement an advanced fuzzy diffing feature for interdiff
This implements a --fuzzy option to make interdiff perform a fuzzy
comparison between two diffs. This is very helpful, for example, for
comparing a backport patch to its upstream source patch to assist a human
reviewer in verifying the correctness of the backport.
The fuzzy diffing process is complex and works by:
- Generating a new patch file with hunks split up into smaller hunks to
separate out multiple deltas (+/- lines) in a single hunk that are spaced
apart by context lines, increasing the amount of deltas that can be
applied successfully with fuzz
- Applying the rewritten p1 patch to p2's original file, and the rewritten
p2 patch to p1's original file; the original files aren't ever merged
- Relocating patched hunks in only p1's original file to align with their
respective locations in the other file, based on the reported line
offset printed out by `patch` for each hunk it successfully applied
- Squashing unline gaps fewer than max_context*2 lines between hunks in the
patched files, to hide unknown contextual information that is irrelevant
for comparing the two diffs while also improving hunk alignment between
the two patched files
- Diffing the two patched files as usual
- Rewriting the hunks in the diff output to exclude unlines from the
unified diff, even splitting up hunks to remove unlines present in the
middle of a hunk, while also adjusting the @@ line to compensate for the
change in line offsets
- Emitting the rewritten diff output while interleaving rejected hunks from
both p1 and p2 in the output in order by line number, with a comment on
the @@ line indicating when an emitted hunk is a rejected hunk
This also involves working around some bugs in `patch` itself encountered
along the way, such as occasionally inaccurate line offsets printed out and
spurious fuzzing in certain cases that involve hunks with an unequal number
of pre-context and post-context lines.
The end result of all of this is a minimal set of real differences in the
context lines of each hunk between the user's provided diffs. Even when
fuzzing results in a faulty patch, the context differences are shown so
there is never a risk of any real deltas getting hidden due to fuzzing.
By default, the fuzz factor used is just the default used in `patch`. The
fuzz factor can be adjusted by the user via appending =N to `--fuzzy` to
specify the maximum number of context lines for `patch` to fuzz.1 parent 430bbfc commit 98ff6ee
1 file changed
Lines changed: 859 additions & 29 deletions
0 commit comments