Skip to content

GHA arm socket-tcp*-closing timeout #2704

@kolyshkin

Description

@kolyshkin

Since commit #2566, merged in April, tests were running fine on GHA arm instances (ubuntu-24.04-arm), but since about Jun 25, some tests are timing out.

Here's the last successful run, from Jun 20: https://github.com/checkpoint-restore/criu/actions/runs/15791341030/job/44517392015

Here's the first failed run, from Jun 25: https://github.com/checkpoint-restore/criu/actions/runs/15887359310/job/44802498433

Quoting that failed job:

################### 3 TEST(S) FAILED (TOTAL 478/SKIPPED 54) ####################
 * zdtm/static/socket-tcp-closing(unknown)
 * zdtm/static/socket-tcp4v6-closing(unknown)
 * zdtm/static/socket-tcp6-closing(unknown)
##################################### FAIL #####################################

More logs for one of the tests (the other two seem to fail in a similar way):

=================== Run zdtm/static/socket-tcp-closing in ns ===================
Start test
./socket-tcp-closing --pidfile=socket-tcp-closing.pid --outfile=socket-tcp-closing.out
Running zdtm/static/socket-tcp-closing.hook(--post-start)
Traceback (most recent call last):
  File "/home/runner/work/criu/criu/test/zdtm.py", line 2968, in <module>
    fork_zdtm()
  File "/home/runner/work/criu/criu/test/zdtm.py", line 2959, in fork_zdtm
    do_run_test(tinfo[0], tinfo[1], tinfo[2], tinfo[3])
  File "/home/runner/work/criu/criu/test/zdtm.py", line 2034, in do_run_test
    t.start()
  File "/home/runner/work/criu/criu/test/zdtm.py", line 532, in start
    self.__make_action('pid', env, self.__flavor.root)
  File "/home/runner/work/criu/criu/test/zdtm.py", line 469, in __make_action
    if s.wait(timeout=self.__timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 2045, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['make', '--no-print-directory', '-C', 'zdtm/static', 'socket-tcp-closing.pid']' timed out after 30 seconds
Exception ignored in atexit callback: <function clean_tests_root at 0xff9f17fdde40>
Traceback (most recent call last):
  File "/home/runner/work/criu/criu/test/zdtm.py", line 84, in clean_tests_root
    os.rmdir(os.path.join(tests_root[1], "root/root"))
OSError: [Errno 16] Device or resource busy: '/tmp/criu-root-kyb7d62y/root/root'

(I found all this as I try to switch from actuated CI to GHA arm in runc (opencontainers/runc#4844), and seeing the same issue (tests time out).

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions