Skip to content

Improve valgrind support for cluster tests#7725

Merged
oranagra merged 1 commit intoredis:unstablefrom
oranagra:cluster-valgrind-tests
Sep 6, 2020
Merged

Improve valgrind support for cluster tests#7725
oranagra merged 1 commit intoredis:unstablefrom
oranagra:cluster-valgrind-tests

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Aug 30, 2020

  • redirect valgrind reports to a dedicated file rather than console
  • try to avoid killing instances with SIGKILL so that we get the memory
    leak report (killing with SIGTERM before resorting to SIGKILL)
  • search for valgrind reports when done, print them and fail the tests
  • add --dont-clean option to keep the logs on exit
  • fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:

  • refactor check_valgrind_errors into two functions one to search and
    one to report
  • move the search half into util.tcl to serve the cluster tests too
  • ignore "address range perms" valgrind warnings which seem non relevant.

@oranagra
Copy link
Member Author

Some tests currently fail (probably on timing issues), so we cannot yet use it in daily CI.

@oranagra
Copy link
Member Author

did this to try to catch the problem in #7719 but no luck.

- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
@oranagra oranagra requested review from guybe7 and yossigo September 3, 2020 14:54
set errfile [file join $dirname err.txt]
if {$::valgrind} {
set pid [exec valgrind --track-origins=yes --suppressions=../../../src/valgrind.sup --show-reachable=no --show-possibly-lost=no --leak-check=full ../../../src/${prgname} $cfgfile &]
set pid [exec valgrind --track-origins=yes --suppressions=../../../src/valgrind.sup --show-reachable=no --show-possibly-lost=no --leak-check=full ../../../src/${prgname} $cfgfile 2>> $errfile &]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra Is there a particular reason you're redirecting stderr rather than use the valgrind --log-file option?

It may not matter if we don't expect $prgname to use stderr but that may not always be the case and error messages that were printed to console in the past would no longer be visible at real time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo I followed the same technique that is used by the main redis tests.
The main redis tests redirect stdout to one file and stderr to another.
The cluster tests are using the logfile config instead of redirecting the stdout.

Anyway, redis doesn't normally outputs anything to stderr, and if we want to change the valgrind log approach, it's more important to change the main redis test suite too.
Although come to think of it, both run multiple servers in parallel, so putting things, to stderr rather than a file may be confusing.

Let me know if you still think this should be changed and if you wanna change both tests infra or just the one

@oranagra oranagra merged commit 2b998de into redis:unstable Sep 6, 2020
@oranagra oranagra deleted the cluster-valgrind-tests branch September 6, 2020 08:11
oranagra added a commit that referenced this pull request Sep 10, 2020
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.

(cherry picked from commit 2b998de)
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Nov 4, 2020
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
jschmieg pushed a commit to memKeyDB/memKeyDB that referenced this pull request Nov 6, 2020
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.

(cherry picked from commit 2b998de)
oranagra added a commit that referenced this pull request Oct 28, 2021
When stopping an instance in the cluster tests, disable appendonly first, so that SIGTERM won't be ignored.

Recently in #9679 i change the test infra to use SIGSEGV to kill servers that refuse
the SIGTERM rather than do SIGKILL directly.

This surfaced an issue that i've added in #7725 which changed SIGKILL to SIGTERM (to resolve valgrind issues).
So the current situation in the past months was that sometimes servers refused the
SIGTERM and waited 10 seconds for the SIGKILL, and this commit resolves that (faster termination).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants