Add HEALTHCHECK to Linux Docker image.#128
Add HEALTHCHECK to Linux Docker image.#128emetsger wants to merge 3 commits intotestcontainers:mainfrom
Conversation
HofmeisterAn
left a comment
There was a problem hiding this comment.
The problem is that it the ryuk container can be exposed by the Docker API before the container is ready to accept connections. A testcontainer client can attempt to connect to the container before it is accepting connections.
Out of curiosity and to understand and compare the behavior (issue) with the .NET implementation: Does Go create a single Ryuk instance for all packages? Does one package connect successfully and the others fail? Does it retry to connect if it fails?
windows/Dockerfile
Outdated
| COPY --from=build /bin/ryuk /bin/ryuk | ||
| CMD ["/bin/ryuk"] | ||
| LABEL org.testcontainers.ryuk=true | ||
| HEALTHCHECK --start-period=5s --interval=2s --timeout=2s --retries=10 CMD netstat -ltn | grep -q ":8080" |
There was a problem hiding this comment.
I do not think the CMD will work on Windows (nanoserver).
There was a problem hiding this comment.
Of course, yes it won't. Sorry- I was a bit sleep deprived making this PR. I don't have access to a windows system but I can do some research to find a simple command that works in nanoserver!
There was a problem hiding this comment.
Since I'm not on windows, and am not familiar with nanoserver, I thought it best to revert the changes on windows/Dockerfile. It would be best for someone with Windows experience to implement and test a Windows health check.
Yes, exactly.
One package typically connects successfully. With the other packages, it's a race: if they get the container before it's ready to accept connections, they fail. In my testing, all remaining packages fail. But I can imagine in a larger codebase, some packages would succeed. How does .NET work: is it a single ryuk instance? For example, can you launch two tests in parallel and re-create the behavior (issue)? |
|
With respect to the languages/platforms supported by If other platforms are affected by this same issue, then they'll need a patch similar to testcontainers/testcontainers-go#2508, which actually uses the results of the healthcheck before returning a (healthy) ryuk container. |
Thanks for the quick explanation. .NET behaves different here. It creates a Ryuk instance for each test assembly (test project). Test assemblies are kind of isolated from each other. We do not need to deal with synchronization here. |
|
I did encounter some things that I can't explain during my investigation, so I wanted to add them here for full transparency. TL;DR: I failed to identify which component was denying the connection. I have an overly-simple mental model of how Docker API+Docker Engine work together with their containers, so these issues could just be my ignorance. The error message indicates the connection was denied, not a timeout.So who denied the connection? The Docker engine? It wasn't the I did not observe the denied TCP connectionWhen I was troubleshooting, I used |
hey @HofmeisterAn, yes, Go creates one Ryuk container per test session, which implies some synchronisation logic to check if the ryuk container is in the running state before creating a new one. @emetsger for the repro repository, I cloned it and it works for me with both values of Using all processors |
@mdelapenya I have the same result with testcontainers-go and Docker Desktop, but when I run Docker with Colima I randomly have the issue when TestContainers wants to connect to Ryuk (95% of the time it triggers the issue) |
|
Pinging @eddumelendez and @kiview as this is a cross-lang feature. |
|
Can we decouple the switch to using a @HofmeisterAn I guess we didn't fully sync up on the current approach to test session semantics (and technical measures to achieve them) across languages, back when we introduced this in Go (and AFAIK in Node as well). From a technical PoV, Go and .NET should be in a similar situation. |
|
Closing as per testcontainers/testcontainers-go#2508 (comment) Thanks! |
Summary
I ran across an issue using testcontainers in golang, specifically when testcontainers are used by multiple test packages in a single project. Go builds a test binary for each package and runs them simultaneously.
Example project tree, where test binaries for packages
a,b,c, anddwill run simultaneously, attempting to use a testcontainer:One of the test binaries will win the race to create the ryuk container, and it becomes present in the list of docker containers returned by the Docker API.
The problem is that it the ryuk container can be exposed by the Docker API before the container is ready to accept connections. A testcontainer client can attempt to connect to the container before it is accepting connections.
This results in error messages tests like:
Clone this test case repo and run
go test -count=1 -v ./...and observe the errors.Run
go test -count=1 -p 1 -v ./...and tests should pass.Prior issues
Race conditions (and the
connection refusederror message) have been the topic of previous issues, c.f.But this PR addresses the situation described here by adding a simple healthcheck using
netstat(already present in the alpine image).Future Work
Once healthcheck information is exposed by the
ryukcontainer, the lookupReaperContainer func can retry until the status is healthy.I'm not on a Windows platform, and I'm not familiar with nanoserver and whether it comes with Powershell, and whether a command like
netstat -an | Select-String 8080will work. This PR only adds aHEALTHCHECKfor the linux platform.Unintended Consequences
Adding a healthcheck to ryuk may increase its perceived startup time.
Example Healthcheck Output
Before healthcheck (current state, i.e.
main):After healthcheck (this PR):