I am using podman as a docker replacement on our gitlab-runner host. I have a 40 containers concurrency limit and when I start my tests, I get DNS resolution errors.
Testing environment:
- 16 vCPUs
- 24GB memory
- CentOS 9 stream
- podman 4.6.1-5.
- slirp4netns: slirp4netns-1.2.2-1.el9.x86_64
- aardvark-dns: aardvark-dns-1.7.0-1.el9
While running tests, I get random dns resolution fail errors inside containers (actual host replaced with host.example.tld):
Example 1:
Cloning into 'spec/fixtures/modules/yumrepo_core'...
ssh: Could not resolve hostname host.example.tld: Temporary failure in name resolution
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Example 2:
$ bundle install -j $(nproc)
Fetching gem metadata from https://host.example.tld/nexus/repository/GroupRubyGems/..
Fetching gem metadata from https://host.example.tld/nexus/repository/GroupRubyGems/..
Could not find gem 'beaker (~> 5)' in any of the gem sources listed in your
Example 3:
Initialized empty Git repository in /builds/puppet/freeradius/.git/
Created fresh repository.
fatal: unable to access 'https://host.example.tld/puppet/freeradius.git/': Could not resolve host: host.example.tld
Cleaning up project directory and file based variables
This does not happen in every container, it's sporadic and random. If I switch back to cni backend, it works without errors.
I tried running up to 8 containers and flooding the dns server with dns lookups, but I could not get a DNS resolution error. Will try to ramp that up to 30-40 and see if I can reproduce.
If anyone has an idea how to debug this, I will gladly look into it if my knowledge allows me.
I am using podman as a docker replacement on our gitlab-runner host. I have a 40 containers concurrency limit and when I start my tests, I get DNS resolution errors.
Testing environment:
While running tests, I get random dns resolution fail errors inside containers (actual host replaced with host.example.tld):
This does not happen in every container, it's sporadic and random. If I switch back to
cnibackend, it works without errors.I tried running up to 8 containers and flooding the dns server with dns lookups, but I could not get a DNS resolution error. Will try to ramp that up to 30-40 and see if I can reproduce.
If anyone has an idea how to debug this, I will gladly look into it if my knowledge allows me.