When the function diff_match_patch.prototype.patch_addContext_ adds context to a patch, it increments/decreases the index by a constant, Patch_Margin = 4. However, since JavaScript's substring function operates with UTF-16 code unit indexing, there's a chance that Patch_Margin may split a Unicode surrogate pair.
Consider the following example:
import diff_match_patch from "diff-match-patch";
console.log(
JSON.stringify(
new diff_match_patch().patch_make("🧮 **a", "🧮 **")[0].diffs[0][1],
)
);
The output is "\uddee **" (🧮 corresponds to "\ud83e\uddee").
If you attempt to use diff_match_patch.patch_obj.prototype.toString on this patch, it leads to a crash. encodeURI will throw a URIError if URI contains a lone surrogate.
import diff_match_patch from "diff-match-patch";
const diff = new diff_match_patch();
console.log(
JSON.stringify(
diff.patch_toText(diff.patch_make("🧮 **a", "🧮 **")) // URIError: URI malformed
)
);
A straightforward solution might involve adding a verification step after applying Patch_Margin to ensure the indices remain valid. I can start a PR, but I've noticed that Patch_Margin is used in many places, and I'm unsure about the best way to make changes.
When the function
diff_match_patch.prototype.patch_addContext_adds context to a patch, it increments/decreases the index by a constant,Patch_Margin = 4. However, since JavaScript'ssubstringfunction operates with UTF-16 code unit indexing, there's a chance thatPatch_Marginmay split a Unicode surrogate pair.Consider the following example:
The output is
"\uddee **"(🧮 corresponds to"\ud83e\uddee").If you attempt to use
diff_match_patch.patch_obj.prototype.toStringon this patch, it leads to a crash.encodeURIwill throw aURIErrorif URI contains a lone surrogate.A straightforward solution might involve adding a verification step after applying
Patch_Marginto ensure the indices remain valid. I can start a PR, but I've noticed thatPatch_Marginis used in many places, and I'm unsure about the best way to make changes.