java-diff-utils icon indicating copy to clipboard operation
java-diff-utils copied to clipboard

Performance Very Poor Compared to Unix Diff Tool

Open DaGeRe opened this issue 3 years ago • 2 comments

Describe the bug I would expect an implementation of the Myers diff algorithm to behave (at least asymptotically) equal. Unfortunately, this is not the case for this implementation: Starting with ~100k lines that are compared, the time consumption growth heavily (at least more than linear).

You can see this in the following graph:

grafik

To Reproduce Steps to reproduce the behavior:

  1. Use https://github.com/DaGeRe/diff-benchmark (which contains 414ae6.zip and afdedc.zip which contain data that can be compared)
  2. Run [runJavaDiff.sh](https://github.com/DaGeRe/diff-benchmark/blob/main/java-difflib-benchmark/runJavaDiff.sh) and runBashDiff.sh to obtain the measurement values.
  3. Copy the measurement data and execute plot.plt to obtain the graph.

Expected behavior The duration of the diff execution should grow linear with count of analyzed lines.

System

  • Java version 1.8
  • Version 4.11

DaGeRe avatar Jun 12 '22 12:06 DaGeRe

Did you use the new Meyer Alg implementation (#134)? Since the unix tools using the proposed variant of Meyers algorithm with linear space and a significant performance gain, I am surprised that DiffTools perform not worse. The default implementation of DiffUtils use the unimproved version of Meyers paper. Look into the the issue and try the new implementation. This one is not yet the default algorithm, that DiffUtils use.

Another point is, but that is my taste, that the original algorithm does provide more natural patches and suits better for inline comparision.

wumpz avatar Jul 09 '22 14:07 wumpz

Any more tests here?

wumpz avatar Sep 08 '22 20:09 wumpz

Stale issue message

github-actions[bot] avatar Jan 05 '23 02:01 github-actions[bot]