Skip to content

Converting large collections and streams to strings can be more efficient #3123

@etellman

Description

@etellman

Describe the bug
Converting a large collection with a time-consuming string conversion for each element takes longer than it needs to because AssertJ converts all the elements in the collection to strings but only uses the first and last 500 strings.

This is related to #3065, but for Streams and Iterables instead of arrays.

  • assertj core version: 3.25.0-SNAPSHOT
  • java version: openjdk 17.0.7
  • test framework version: JUnit5
  • os (if relevant): OSX, but probably not relevant

Test case reproducing the bug

Converting this collection a string takes 27 minutes on my MacBook:

    int elementsPerArray = 1000;
    List<int[]> numbers = new ArrayList<>();
    for (int i = 0; i < 1 << 20; i++) {
      numbers.add(new int[elementsPerArray]);
    }

It's also possible to get OOM error with more elements in the list and smaller arrays in each element.

A possible fix would be to narrow the stream down to the elements are actually needed and then just convert those to strings. I think this can be done by changing StandardRepresentation.representElements() to:

  private List<String> representElements(Stream<?> elements, String start, String end, String elementSeparator,
                                         String indentation, Object root) {
    // new
   final PrintingAccumulator accumulator = new PrintingAccumulator(maxElementsForPrinting);
   elements.forEach(accumulator::add);

   // same, but uses narrowed down list of elements from the accumulator instead of the original stream
   return accumulator.toList().stream().map(element -> safeStringOf(element, start, end, elementSeparator, indentation, root))
                  .collect(toList());
  }

PrintingAccumulator only retains the elements that will be used in the final string conversion and discards everything in the middle of the collection.

I gave this a quick try and it produced the correct string in around 2 seconds. I can clean this up a bit and attach a PR with the proposed solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions