Skip to content

Fix MPT flakiness #15131

@mfleming

Description

@mfleming

I've observed that MPT sometimes fails when run with cdt, though usually not on the first run. Since MPT is the foundation for the partition density work we really need a good signal from MPT.

Initial analysis looks like it's just an issue with stopping rp.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 887, in test_many_partitions
    self._test_many_partitions(compacted=False)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 1038, in _test_many_partitions
    with repeater_traffic(context=self._ctx,
  File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 377, in repeater_traffic
    svc.stop()
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 205, in stop
    super().stop(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 310, in stop
    self.stop_node(node, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 214, in stop_node
    node.account.signal(self._pid, signal.SIGKILL, allow_fail=False)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/cluster/remoteaccount.py", line 418, in signal
    self.ssh(cmd, allow_fail=allow_fail)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/cluster/remoteaccount.py", line 41, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/cluster/remoteaccount.py", line 300, in ssh
    raise RemoteCommandError(self, cmd, exit_status, stderr.read())
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-32-195: Command 'kill -9 5973' returned non-zero exit status -1.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions