Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive performance degradation when truncating large file (20,000x slower than Google's diff-match-patch package) #239

Closed
josephrocca opened this issue Dec 28, 2018 · 5 comments

Comments

@josephrocca
Copy link

josephrocca commented Dec 28, 2018

Hi there, I've been having some performance problems with createPatch and have managed to create a minimal example that replicates the problem. The patch creation takes about 20,000 times longer than it should. It seems to occur when I'm doing large truncations. Here's the code:

const {performance} = require("perf_hooks");
const JsDiff = require('diff');
const diff_match_patch = require("diff-match-patch");
const dmp = new diff_match_patch();

function makePatch(oldText, newText) {
  let t;

  t = performance.now();
  let textPatch1 = JsDiff.createPatch("foo.txt", oldText+"\n", newText+"\n", "", "");
  console.log("JsDiff (ms):", performance.now()-t);

  t = performance.now();
  let textPatch2 = dmp.patch_make(oldText+"\n", newText+"\n");
  console.log("diff_patch_match (ms):", performance.now()-t);
}

// make a string of 25k lines of random numbers:
text1 = new Array(25000).fill(0).map(n => new Array(20).fill(1).map(n => Math.floor(Math.random()*10)).join("")).join("\n");
// grab the first 10,000 characters:
text2 = text1.slice(0, 10000);
// make the patch:
makePatch(text1, text2);

Here's the output I get for that:

JsDiff (ms): 133238.9396559
diff_patch_match (ms): 6.825272083282471

So unless I'm doing something wrong here, there's a huge performance disparity between this package and the diff-patch-match package for patch creation in certain situations. diff-patch-match is normally about 2x-10x faster depending on the size of the strings, but here its a lot faster... so something's obviously going wrong here.

This is what I'm seeing in the profiler:

image

Any idea what's happening here?

@josephrocca josephrocca changed the title Massive performance degradation when truncating large file google's diff-match-patch library Massive performance degradation when truncating large file Dec 28, 2018
@josephrocca josephrocca changed the title Massive performance degradation when truncating large file Massive performance degradation when truncating large file (20,000x slower than Google's diff-patch-match package) Feb 1, 2019
@josephrocca josephrocca changed the title Massive performance degradation when truncating large file (20,000x slower than Google's diff-patch-match package) Massive performance degradation when truncating large file (20,000x slower than Google's diff-match-patch package) Feb 1, 2019
@leoswing
Copy link

Hi there, I've been having some performance problems with createPatch and have managed to create a minimal example that replicates the problem. The patch creation takes about 20,000 times longer than it should. It seems to occur when I'm doing large truncations. Here's the code:

const {performance} = require("perf_hooks");
const JsDiff = require('diff');
const diff_match_patch = require("diff-match-patch");
const dmp = new diff_match_patch();

function makePatch(oldText, newText) {
  let t;

  t = performance.now();
  let textPatch1 = JsDiff.createPatch("foo.txt", oldText+"\n", newText+"\n", "", "");
  console.log("JsDiff (ms):", performance.now()-t);

  t = performance.now();
  let textPatch2 = dmp.patch_make(oldText+"\n", newText+"\n");
  console.log("diff_patch_match (ms):", performance.now()-t);
}

// make a string of 25k lines of random numbers:
text1 = new Array(25000).fill(0).map(n => new Array(20).fill(1).map(n => Math.floor(Math.random()*10)).join("")).join("\n");
// grab the first 10,000 characters:
text2 = text1.slice(0, 10000);
// make the patch:
makePatch(text1, text2);

Here's the output I get for that:

JsDiff (ms): 133238.9396559
diff_patch_match (ms): 6.825272083282471

So unless I'm doing something wrong here, there's a huge performance disparity between this package and the diff-patch-match package for patch creation in certain situations. diff-patch-match is normally about 2x-10x faster depending on the size of the strings, but here its a lot faster... so something's obviously going wrong here.

This is what I'm seeing in the profiler:

image

Any idea what's happening here?

But google diff could not generate a diff like git diff or linux diff tool, so it could not make a beautiful diff, any solutions? Much appriciated

@joyceerhl
Copy link

joyceerhl commented Dec 14, 2021

I profiled createPatch from v5.0.0 of this library after running into this exact issue and wondering why Google's DMP implementation is so much more performant. For my inputs of ~34K LOC, over 60% of the execution time is spent garbage collecting JS objects when I inline the function definition of clonePath here:

basePath = clonePath(removePath);

jsdiff/src/diff/base.js

Lines 230 to 232 in d358a57

function clonePath(path) {
return { newPos: path.newPos, components: path.components.slice(0) };
}

image

@erothmayer
Copy link

I ran into this as well. There are a few spots which are using new objects inefficiently, with clonePath being the worst among them. The current algorithm does a shallow-clone of the entire diagonal every time it takes a step forward, which ends up churning memory really badly on large files. I was unable to diff 2 ~10k LOC files at all, the browser just hangs.

I have a working copy of a local change which makes this a lot better, which I'll PR shortly.

@ExplodingCabbage
Copy link
Collaborator

#448 and #411 have sped jsdiff up on this example to the point that it can generate the patch almost instantly now - about 15-30ms on my machine.

@ExplodingCabbage
Copy link
Collaborator

(Note they're pre-release, but you can test against master if you want to confirm.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants