Skip to content

Commit

Permalink
Speed up algorithm by not considering diagonals that take us off the …
Browse files Browse the repository at this point in the history
…edge of the graph (#448)

* Speed up algorithm by not considering diagonals that take us off the edge of the graph

* Note deviations from Myers diff in the README

* Add release notes

* Use capitalisation 'jsdiff', which seems most common
  • Loading branch information
ExplodingCabbage authored Dec 29, 2023
1 parent bf5ec4a commit b1b2035
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 3 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,10 @@ jsdiff supports all ES3 environments with some known issues on IE8 and below. Un
## License
See [LICENSE](https:/kpdecker/jsdiff/blob/master/LICENSE).
## Deviations from the published Myers diff algorithm
jsdiff deviates from the published algorithm in a couple of ways that don't affect results but do affect performance:

* jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.
* jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.
1 change: 1 addition & 0 deletions release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
[Commits](https:/kpdecker/jsdiff/compare/v5.1.0...master)

- [#411](https:/kpdecker/jsdiff/pull/411) Big performance improvement. Previously an O(n) array-copying operation inside the innermost loop of jsdiff's base diffing code increased the overall worst-case time complexity of computing a diff from O(n²) to O(n³). This is now fixed, bringing the worst-case time complexity down to what it theoretically should be for a Myers diff implementation.
- [#448](https:/kpdecker/jsdiff/pull/411) Performance improvement. Diagonals whose furthest-reaching D-path would go off the edge of the edit graph are now skipped, rather than being pointlessly considered as called for by the original Myers diff algorithm. This dramatically speeds up computing diffs where the new text just appends or truncates content at the end of the old text.
- [#351](https:/kpdecker/jsdiff/issues/351) Importing from the lib folder - e.g. `require("diff/lib/diff/word.js")` - will work again now. This had been broken for users on the latest version of Node since Node 17.5.0, which changed how Node interprets the `exports` property in jsdiff's `package.json` file.
- [#344](https:/kpdecker/jsdiff/issues/344) `diffLines`, `createTwoFilesPatch`, and other patch-creation methods now take an optional `stripTrailingCr: true` option which causes Windows-style `\r\n` line endings to be replaced with Unix-style `\n` line endings before calculating the diff, just like GNU `diff`'s `--strip-trailing-cr` flag.

Expand Down
34 changes: 31 additions & 3 deletions src/diff/base.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,32 @@ Diff.prototype = {
return done([{value: this.join(newString), count: newString.length}]);
}

// Once we hit the right edge of the edit graph on some diagonal k, we can
// definitely reach the end of the edit graph in no more than k edits, so
// there's no point in considering any moves to diagonal k+1 any more (from
// which we're guaranteed to need at least k+1 more edits).
// Similarly, once we've reached the bottom of the edit graph, there's no
// point considering moves to lower diagonals.
// We record this fact by setting minDiagonalToConsider and
// maxDiagonalToConsider to some finite value once we've hit the edge of
// the edit graph.
// This optimization is not faithful to the original algorithm presented in
// Myers's paper, which instead pointlessly extends D-paths off the end of
// the edit graph - see page 7 of Myers's paper which notes this point
// explicitly and illustrates it with a diagram. This has major performance
// implications for some common scenarios. For instance, to compute a diff
// where the new text simply appends d characters on the end of the
// original text of length n, the true Myers algorithm will take O(n+d^2)
// time while this optimization needs only O(n+d) time.
let minDiagonalToConsider = -Infinity, maxDiagonalToConsider = Infinity;

// Main worker method. checks all permutations of a given edit length for acceptance.
function execEditLength() {
for (let diagonalPath = -1 * editLength; diagonalPath <= editLength; diagonalPath += 2) {
for (
let diagonalPath = Math.max(minDiagonalToConsider, -editLength);
diagonalPath <= Math.min(maxDiagonalToConsider, editLength);
diagonalPath += 2
) {
let basePath;
let removePath = bestPath[diagonalPath - 1],
addPath = bestPath[diagonalPath + 1];
Expand Down Expand Up @@ -81,12 +104,17 @@ Diff.prototype = {

newPos = self.extractCommon(basePath, newString, oldString, diagonalPath);

// If we have hit the end of both strings, then we are done
if (basePath.oldPos + 1 >= oldLen && newPos + 1 >= newLen) {
// If we have hit the end of both strings, then we are done
return done(buildValues(self, basePath.lastComponent, newString, oldString, self.useLongestToken));
} else {
// Otherwise track this path as a potential candidate and continue.
bestPath[diagonalPath] = basePath;
if (basePath.oldPos + 1 >= oldLen) {
maxDiagonalToConsider = Math.min(maxDiagonalToConsider, diagonalPath - 1);
}
if (newPos + 1 >= newLen) {
minDiagonalToConsider = Math.max(minDiagonalToConsider, diagonalPath + 1);
}
}
}

Expand Down

0 comments on commit b1b2035

Please sign in to comment.