Improve valgrind support for cluster tests by oranagra · Pull Request #7725 · redis/redis

oranagra · 2020-08-30T10:43:40Z

redirect valgrind reports to a dedicated file rather than console
try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
search for valgrind reports when done, print them and fail the tests
add --dont-clean option to keep the logs on exit
fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:

refactor check_valgrind_errors into two functions one to search and
one to report
move the search half into util.tcl to serve the cluster tests too
ignore "address range perms" valgrind warnings which seem non relevant.

oranagra · 2020-08-30T10:44:09Z

Some tests currently fail (probably on timing issues), so we cannot yet use it in daily CI.

oranagra · 2020-08-30T11:09:58Z

did this to try to catch the problem in #7719 but no luck.

- redirect valgrind reports to a dedicated file rather than console - try to avoid killing instances with SIGKILL so that we get the memory leak report (killing with SIGTERM before resorting to SIGKILL) - search for valgrind reports when done, print them and fail the tests - add --dont-clean option to keep the logs on exit - fix exit error code when crash is found (would have exited with 0) changes that affect the normal redis test suite: - refactor check_valgrind_errors into two functions one to search and one to report - move the search half into util.tcl to serve the cluster tests too - ignore "address range perms" valgrind warnings which seem non relevant.

yossigo · 2020-09-03T19:13:01Z

tests/instances.tcl

+    set errfile [file join $dirname err.txt]
    if {$::valgrind} {
-        set pid [exec valgrind --track-origins=yes --suppressions=../../../src/valgrind.sup --show-reachable=no --show-possibly-lost=no --leak-check=full ../../../src/${prgname} $cfgfile &]
+        set pid [exec valgrind --track-origins=yes --suppressions=../../../src/valgrind.sup --show-reachable=no --show-possibly-lost=no --leak-check=full ../../../src/${prgname} $cfgfile 2>> $errfile &]


@oranagra Is there a particular reason you're redirecting stderr rather than use the valgrind --log-file option?

It may not matter if we don't expect $prgname to use stderr but that may not always be the case and error messages that were printed to console in the past would no longer be visible at real time.

@yossigo I followed the same technique that is used by the main redis tests.
The main redis tests redirect stdout to one file and stderr to another.
The cluster tests are using the logfile config instead of redirecting the stdout.

Anyway, redis doesn't normally outputs anything to stderr, and if we want to change the valgrind log approach, it's more important to change the main redis test suite too.
Although come to think of it, both run multiple servers in parallel, so putting things, to stderr rather than a file may be confusing.

Let me know if you still think this should be changed and if you wanna change both tests infra or just the one

- redirect valgrind reports to a dedicated file rather than console - try to avoid killing instances with SIGKILL so that we get the memory leak report (killing with SIGTERM before resorting to SIGKILL) - search for valgrind reports when done, print them and fail the tests - add --dont-clean option to keep the logs on exit - fix exit error code when crash is found (would have exited with 0) changes that affect the normal redis test suite: - refactor check_valgrind_errors into two functions one to search and one to report - move the search half into util.tcl to serve the cluster tests too - ignore "address range perms" valgrind warnings which seem non relevant. (cherry picked from commit 2b998de)

- redirect valgrind reports to a dedicated file rather than console - try to avoid killing instances with SIGKILL so that we get the memory leak report (killing with SIGTERM before resorting to SIGKILL) - search for valgrind reports when done, print them and fail the tests - add --dont-clean option to keep the logs on exit - fix exit error code when crash is found (would have exited with 0) changes that affect the normal redis test suite: - refactor check_valgrind_errors into two functions one to search and one to report - move the search half into util.tcl to serve the cluster tests too - ignore "address range perms" valgrind warnings which seem non relevant.

- redirect valgrind reports to a dedicated file rather than console - try to avoid killing instances with SIGKILL so that we get the memory leak report (killing with SIGTERM before resorting to SIGKILL) - search for valgrind reports when done, print them and fail the tests - add --dont-clean option to keep the logs on exit - fix exit error code when crash is found (would have exited with 0) changes that affect the normal redis test suite: - refactor check_valgrind_errors into two functions one to search and one to report - move the search half into util.tcl to serve the cluster tests too - ignore "address range perms" valgrind warnings which seem non relevant. (cherry picked from commit 2b998de)

When stopping an instance in the cluster tests, disable appendonly first, so that SIGTERM won't be ignored. Recently in #9679 i change the test infra to use SIGSEGV to kill servers that refuse the SIGTERM rather than do SIGKILL directly. This surfaced an issue that i've added in #7725 which changed SIGKILL to SIGTERM (to resolve valgrind issues). So the current situation in the past months was that sometimes servers refused the SIGTERM and waited 10 seconds for the SIGKILL, and this commit resolves that (faster termination).

oranagra requested review from guybe7 and yossigo September 3, 2020 14:54

guybe7 approved these changes Sep 3, 2020

View reviewed changes

yossigo reviewed Sep 3, 2020

View reviewed changes

yossigo approved these changes Sep 6, 2020

View reviewed changes

oranagra merged commit 2b998de into redis:unstable Sep 6, 2020

oranagra deleted the cluster-valgrind-tests branch September 6, 2020 08:11

oranagra mentioned this pull request Oct 28, 2021

fix failing cluster tests #9695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve valgrind support for cluster tests#7725

Improve valgrind support for cluster tests#7725
oranagra merged 1 commit intoredis:unstablefrom
oranagra:cluster-valgrind-tests

oranagra commented Aug 30, 2020 •

edited

Loading

Uh oh!

oranagra commented Aug 30, 2020

Uh oh!

oranagra commented Aug 30, 2020

Uh oh!

yossigo Sep 3, 2020

Uh oh!

oranagra Sep 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oranagra commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Aug 30, 2020

Uh oh!

oranagra commented Aug 30, 2020

Uh oh!

yossigo Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

oranagra Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oranagra commented Aug 30, 2020 •

edited

Loading