Testcontainers version
0.29.1
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host arch
x86
Go version
1.22
Docker version
Docker info
What happened?
As I was improving some integration tests in our own project, I sometimes noticed failures after we switched to reusing containers. Since I was improving the run-time of the tests, I was executing ITs several times after each other to make sure code compilation was not included in the time (while go clean -testcache && make integration-test; do :; done. I had a suspicion that me quickly running tests after each other was related to the failures so I did some further investigation.
From testcontainers output I saw several times that more than 1 container was created (while Reuse: true):
2024/03/21 23:10:58 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/21 23:10:58 ✅ Container created: efa32a26eb2e
2024/03/21 23:10:58 🐳 Starting container: efa32a26eb2e
2024/03/21 23:10:59 ✅ Container started: efa32a26eb2e
2024/03/21 23:10:59 🚧 Waiting for container id efa32a26eb2e image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/21 23:10:59 🔔 Container is ready: efa32a26eb2e
2024/03/21 23:10:59 ✅ Container started: 0c8e757faff1
2024/03/21 23:10:59 🚧 Waiting for container id 0c8e757faff1 image: clickhouse/clickhouse-server:24.2-alpine. Waiting for: &{timeout:0xc000799058 URL:0x13356e0 Driver:clickhouse Port:9000/tcp startupTimeout:60000
000000 PollInterval:100ms query:SELECT 1} 2024/03/21 23:10:59 🔔 Container is ready: 0c8e757faff1
2024/03/21 23:10:59 ✅ Container started: 0c8e757faff1
2024/03/21 23:10:59 🚧 Waiting for container id 0c8e757faff1 image: clickhouse/clickhouse-server:24.2-alpine. Waiting for: &{timeout:0xc0005849e8 URL:0x13356e0 Driver:clickhouse Port:9000/tcp startupTimeout:60000
000000 PollInterval:100ms query:SELECT 1}
...
{"status":"error","errorType":"bad_data","error":"dial tcp [::1]:32918: connect: connection refused"}
...
2024/03/21 23:11:07 🐳 Creating container for image clickhouse/clickhouse-server:24.2-alpine
2024/03/21 23:11:07 🚧 Waiting for container id 7e037d775014 image: clickhouse/clickhouse-server:24.2-alpine. Waiting for: &{timeout:0xc0046bf638 URL:0x13356e0 Driver:clickhouse Port:9000/tcp startupTimeout:60000
000000 PollInterval:100ms query:SELECT 1} 2024/03/21 23:11:08 🔔 Container is ready: 7e037d775014
...
As you see, in the same test-run a new ClickHouse container gets created while we never call Terminate. Tests start failing as they can no longer connect to the old container (to which they hold a connection based on the mapped port).
My suspicion was that Ryuk was for some reason terminating the 'old' ClickHouse container, still live from a previous run.
Looking at the code, this appears indeed what is causing it:
- When starting a new test-run, because reusing containers based on name, testcontainers finds the 'old' still running ClickHouse:
|
c, err := p.findContainerByName(ctx, req.Name) |
- For Ryuk however, a new Container is created based on the
SessionID:
|
r, err := reuseOrCreateReaper(context.WithValue(ctx, core.DockerHostContextKey, p.host), sessionID, p) |
- The 'old' Ryuk, belonging to the previous test run, won't get a new connection and so after 10s kills itself, but also the still running ClickHouse container.
To fix this, my proposal would be to somehow add the SessionID also to the 'reusable' container name. Either implicitly or by some flag or by exposing the SessionID so the user can add it (currently it's part of the internal package so not reachable).
I'm happy to work on a fix, if I can get any suggestion for a preferred approach.
Relevant log output
docker ps -a output after the failure, showing two reapers and one CH container:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
efa32a26eb2e testcontainers/ryuk:0.6.0 "/bin/ryuk" 6 seconds ago Up 5 seconds 0.0.0.0:32920->8080/tcp, :::32822->8080/tcp reaper_39db6ba506d2d713d174270a8ad7aeb95fcdc7e5e13895ae3be33fa70ade946a
0c8e757faff1 clickhouse/clickhouse-server:24.2-alpine "/entrypoint.sh" 17 seconds ago Up 16 seconds 9009/tcp, 0.0.0.0:32919->8123/tcp, 0.0.0.0:32918->9000/tcp otel-clickhouse
cdc2a4e002e6 testcontainers/ryuk:0.6.0 "/bin/ryuk" 17 seconds ago Up 16 seconds 0.0.0.0:32917->8080/tcp, :::32821->8080/tcp reaper_ae0426fe5786540b6ee4155474bd2ccf2d21bbe9dd1e0134154826996e67fd9b
Additional information
No response
Testcontainers version
0.29.1
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host arch
x86
Go version
1.22
Docker version
Docker info
What happened?
As I was improving some integration tests in our own project, I sometimes noticed failures after we switched to reusing containers. Since I was improving the run-time of the tests, I was executing ITs several times after each other to make sure code compilation was not included in the time (
while go clean -testcache && make integration-test; do :; done. I had a suspicion that me quickly running tests after each other was related to the failures so I did some further investigation.From testcontainers output I saw several times that more than 1 container was created (while
Reuse: true):As you see, in the same test-run a new ClickHouse container gets created while we never call
Terminate. Tests start failing as they can no longer connect to the old container (to which they hold a connection based on the mapped port).My suspicion was that Ryuk was for some reason terminating the 'old' ClickHouse container, still live from a previous run.
Looking at the code, this appears indeed what is causing it:
testcontainers-go/docker.go
Line 1177 in c83b93c
SessionID:testcontainers-go/docker.go
Line 1201 in c83b93c
To fix this, my proposal would be to somehow add the
SessionIDalso to the 'reusable' container name. Either implicitly or by some flag or by exposing theSessionIDso the user can add it (currently it's part of theinternalpackage so not reachable).I'm happy to work on a fix, if I can get any suggestion for a preferred approach.
Relevant log output
docker ps -aoutput after the failure, showing two reapers and one CH container:Additional information
No response