java-diff-utils icon indicating copy to clipboard operation
java-diff-utils copied to clipboard

Performance Very Poor When Using LinkedList

Open DaGeRe opened this issue 3 years ago • 3 comments

Describe the bug When comparing two LinkedList-Instances, the performance is very poor. I compared ~1,1 million lines, and VisualVM showed me that several minutes where spent at https://github.com/java-diff-utils/java-diff-utils/blob/bc65f9703c137b3a7d2906eeaf064588dd36c125/java-diff-utils/src/main/java/com/github/difflib/algorithm/myers/MeyersDiff.java#L115. With an ArrayList, this was reduced to a few seconds.

From my point of view, there are two possible solutions:

  1. Create a benchmark, refactor this from .get to Iterator-usages and check whether this improves the performance by the benchmark
  2. Change to always require ArrayList instead just List, and document this in generateDiffRows to the end users.

To Reproduce I'll create a benchmark if someone would try to refactor .get to Iterator-usages.

Expected behavior This should either run fast or it should be clear that an ArrayList is required.

System

  • Java version 1.8, but should happen with every version
  • Version 4.11

DaGeRe avatar Jun 10 '22 14:06 DaGeRe

Hi @DaGeRe, could you teach me how to refactor to iterator? I want to create a fix for it

anaconda875 avatar Jul 04 '22 16:07 anaconda875

Iterators in Java are a widely used concept and you can find tutorials at many places, e.g. https://www.w3schools.com/java/java_iterator.asp. Since the existing source code uses indices, I am not sure how this can be implemented successfully.

DaGeRe avatar Jul 05 '22 09:07 DaGeRe

@DaGeRe You are right about using linked list. Since I wanted not to limit the library to ArrayList, I kept this List interface at the core. However, even with an ArrayList you could provide your own implementation that slows things down.

I could go the way e.g. jgit goes and only work on those Seq implementations.

So I will stick with my implementation and take your advice for better documentation. Where would you place a hint like this?

wumpz avatar Jul 09 '22 14:07 wumpz

Stale issue message

github-actions[bot] avatar Jan 05 '23 02:01 github-actions[bot]