Skip to content

Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done#15987

Merged
akuzm merged 1 commit intoClickHouse:masterfrom
filimonov:background-pool-count_no_work_done-bug
Oct 15, 2020
Merged

Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done#15987
akuzm merged 1 commit intoClickHouse:masterfrom
filimonov:background-pool-count_no_work_done-bug

Conversation

@filimonov
Copy link
Copy Markdown
Contributor

@filimonov filimonov commented Oct 14, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Prevent replica hang for 5-10 mins when replication error happens after a period of inactivity.

Detailed description / Documentation draft:
Fixes #15955

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Oct 14, 2020
@akuzm akuzm self-assigned this Oct 15, 2020
@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Oct 15, 2020

Perf test broken in master.

Stress test as well #14414 (comment)

@akuzm akuzm merged commit f366b36 into ClickHouse:master Oct 15, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Oct 15, 2020

@akuzm @filimonov I bet it is the intended behaviour.
Try to create 100 000 MergeTree tables and look what will be CPU usage.

robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2020
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2020
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2020
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2020
@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Oct 15, 2020

I've tested 20.11.1.4925 (that PR) vs 20.11.1.4924 (parent commit) on 250K empty tables (load from 100K empty tables was too low on my laptop)

apt-get update && apt-get install sysstat
clickhouse-client --query="CREATE DATABASE multiple_mergetrees"
time bash -c "seq 1 250000 | sed -r -e 's/^.+$/CREATE TABLE multiple_mergetrees.table\0 (x UInt8) ENGINE = MergeTree ORDER BY x;/' | clickhouse-benchmark -c 32 -i 250000"

That PR:

root@79f396d2840f:/# pidstat 5 -h -u -p $(pidof -s clickhouse-server)
Linux 5.4.0-51-generic (79f396d2840f) 	10/15/2020 	_x86_64_	(12 CPU)

# Time        UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:34:20 PM   101         1    6.60    4.60    0.00    0.00   11.20     5  clickhouse-serv
06:34:25 PM   101         1    7.80    4.60    0.00    0.00   12.40     5  clickhouse-serv

Parent commit:

pidstat 5 -h -u -p $(pidof -s clickhouse-server)
Linux 5.4.0-51-generic (ec31acb0d016) 	10/15/2020 	_x86_64_	(12 CPU)

# Time        UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:59:30 PM   101         1   10.60    3.80    0.00    0.00   14.40    11  clickhouse-serv
06:59:35 PM   101         1    5.80    4.00    0.00    0.00    9.80    11  clickhouse-serv

So no visible change, load is negligable.

akuzm added a commit that referenced this pull request Oct 19, 2020
Backport #15987 to 20.10: Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done
akuzm added a commit that referenced this pull request Oct 19, 2020
Backport #15987 to 20.9: Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done
alexey-milovidov added a commit that referenced this pull request Oct 24, 2020
Backport #15987 to 20.8: Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done
alexey-milovidov added a commit that referenced this pull request Oct 24, 2020
Backport #15987 to 20.7: Fix the bug when NOTHING_TO_DO events wrongly increment count_no_work_done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How can I restore a blocked replication faster?

4 participants