Skip to content

Attempt to fix stress test#13980

Merged
alexey-milovidov merged 5 commits intomasterfrom
fix-stress-test-2
Aug 24, 2020
Merged

Attempt to fix stress test#13980
alexey-milovidov merged 5 commits intomasterfrom
fix-stress-test-2

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Aug 23, 2020

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Possible reason 1:
clickhouse-server may not stop in 120 seconds because OS takes a long time to finish the process and it remains in process table after _Exit. Then we cannot start it and get Already running message.

Possible reason 2:
Our SysV init script is deficient - it checks for running processes with specified pid. But if server was killed, pid can be reused by another process. Need to check that the running process is clickhouse-server and nothing else.

@alexey-milovidov alexey-milovidov added the testing Special issue with list of bugs found by CI label Aug 23, 2020
@robot-clickhouse robot-clickhouse added pr-not-for-changelog This PR should not be mentioned in the changelog labels Aug 23, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Aug 24, 2020

The first reason really exists:

+ stop
+ timeout 120 service clickhouse-server stop
Stop clickhouse-server service: DONE
++ pidof clickhouse-server
+ kill -9 354
Killed clickhouse-server
+ echo 'Killed clickhouse-server'

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Looks like stress test is fixed but we have something strange with stateful tests...

@alexey-milovidov
Copy link
Copy Markdown
Member Author

AST Fuzzer:

SELECT equals(countEqual(materialize([NULL AS x, x]), materialize(x)))

Fixed in #12550

@alexey-milovidov alexey-milovidov merged commit ad64cea into master Aug 24, 2020
@alexey-milovidov alexey-milovidov deleted the fix-stress-test-2 branch August 24, 2020 19:03
@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Aug 24, 2020

Now all stress tests finished correctly.
I've checked the logs and found that in 3 of 4 runs, the server was in processlist after successful STOP and cannot start immediately. But our tweaks helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog testing Special issue with list of bugs found by CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants