Huge performance hit & termination due to high memory usage when using the JUnit reporter

**Describe the bug**
I recently upgraded my C++ projects unit tests by adding a JUnit reporter to use the resulting XML reports in gitlab for test result reports. While the gitlab part works well, I discovered, that adding the JUnit reporter increases the runtime of some of the tests in a really massive way. What normally takes < 10s including all the overhead of managing my environment in the gitlab executor takes - depending on the platform - up to around 4 minutes. The only change being the additional `--reporter "JUnit::out=/tmp/catch2-xml-report"`.

*Edit:*
Additionally, the memory usage also explodes as I just found out. My CI test process, as I was extending my test cases, just "exploded". Meaning it was killed due to memory usage. I found hints to use GNU time to measure not only time but memory information. From what I got the interesting line is the following, first without the JUnit reporter, second with (this is from my actual project):
```
Maximum resident set size (kbytes): 1678916
Maximum resident set size (kbytes): 29153956
```

For the sample project:
```
... % /usr/bin/time -v ./sample_test                                      [cpp-playground/catch2-speed-sample/build/Release]
Randomness seeded to: 1267125391
===============================================================================
All tests passed (62914560 assertions in 1 test case)

	Command being timed: "./sample_test"
	User time (seconds): 1.06
	System time (seconds): 0.03
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 249876
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 294
	Voluntary context switches: 1
	Involuntary context switches: 11
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
... % /usr/bin/time -v ./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report" --reporter 'console::out=-::colour-mode=ansi'
Randomness seeded to: 3630092844
===============================================================================
All tests passed (62914560 assertions in 1 test case)

	Command being timed: "./sample_test --reporter JUnit::out=/tmp/catch2-xml-report --reporter console::out=-::colour-mode=ansi"
	User time (seconds): 11.01
	System time (seconds): 7.73
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.86
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 29087972
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 565154
	Voluntary context switches: 1
	Involuntary context switches: 250
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
```

**Expected behavior**
The test running with only minimal overhead. At least for the success case. In the end the report only contains more or less the same information as the command line (plus the name of each test & its runtime, but that shouldn't be too costly, right?). The file (in the success case) is just 448 characters.

**Reproduction steps**
I build a small sample project showing the problem. It uses conan to get Catch2 but I guess it could easily be included otherwise, conan is just the way I'm used to now: https://github.com/irieger/cpp-playground/blob/main/catch2-speed-sample/

The relevant file is `catch2-speed-sample/src/catch2_many_assertions.cpp`. It "simulates" a common use case - at least for me - where I have a large vector holding image data comparing an input processed through an algorithm with a reference, often with an accepted difference.

This results in quite some millions of assertions. In this case I simulate a 30 Megapixel RGB image. Normally I have much smaller tests but also quite a few of those resulting in some million assertions in sum too.

Running this either directly (or with manually set console output) vs. a JUnit reporter shows the difference. Here is the output of running 10 times for each showing that the timing is overall consistent.

Console only, very consistent ~1.2s overall time for execution per run. Output piped to /dev/null as there intentionally are some assertions failing here:
```
... % for i in $(seq 10); do time ./sample_test --reporter 'console::out=-::colour-mode=ansi' >/dev/null; done
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.15s user 0.06s system 99% cpu 1.212 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.05s system 99% cpu 1.200 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.13s user 0.06s system 99% cpu 1.201 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.17s user 0.04s system 99% cpu 1.212 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.06s system 99% cpu 1.204 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.18s user 0.03s system 99% cpu 1.215 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.06s system 99% cpu 1.203 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.15s user 0.05s system 99% cpu 1.199 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.18s user 0.04s system 99% cpu 1.218 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.13s user 0.08s system 99% cpu 1.215 total
```

Run with JUnit reporter, time per execution ~24.4s:
```
.. % for i in $(seq 10); do time ./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"; done             
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.38s user 12.52s system 99% cpu 24.019 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.60s user 13.07s system 99% cpu 24.794 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.90s user 12.85s system 99% cpu 24.884 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.78s user 13.06s system 99% cpu 24.956 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.71s user 12.19s system 99% cpu 24.020 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.48s user 12.88s system 99% cpu 24.484 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.82s user 12.96s system 99% cpu 24.910 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.56s user 12.91s system 99% cpu 24.585 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.73s user 12.99s system 99% cpu 24.851 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.44s user 12.63s system 99% cpu 24.199 total
```

If I change it to succeed, it takes ~1.1s with the console output & ~18s

**Platform information:**
 - OS: Arch Linux
 - Compiler+version: **GCC v14.2.1**
 - Catch version: **v3.6.0**

The measurements and system information above are from my workstation. In my CI I can reproduce the behaviour for Linux, Mac (for arm64 native and x64 cross-compiled run in Rosetta 2) and Windows, although the performance hit isn't exactly the same, in all cases the reporter is several times slower. In the end it is roughly 10-30 times slower.

**Additional context**
The reporter is a really nice addition and I was happy to find it. But blocking a CI runners for 240s instead of 10s is really a huge hit. Hope there is a way to prevent that.

Maybe an alternative if that is performance relevant could be to have a flag to only report the number of successes/faild in the XML, not report each with a line in the XML? Maybe this is the overhead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions