*: support read only and recovery when disk full #10264

tier-cap · 2021-05-28T10:05:05Z

What problem does this PR solve?

closes #10597

Solve the availability problem when the disk is full, no mather one or two or the majority or all.

What is changed and how it works?

Regularly monitor disk usage
Intercept business write traffic when the disk threshold is triggered
Space reclamation relies on PD for free space reclamation logic

Check List

Test

Release note

not available

ti-chi-bot · 2021-05-28T10:05:06Z

@tier-cap: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot · 2021-05-28T10:05:13Z

Welcome @tier-cap!

It looks like this is your first PR to tikv/tikv 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to tikv/tikv. 😃

components/server/src/server.rs

components/raftstore/src/store/snap.rs

components/server/src/server.rs

src/server/service/kv.rs

hicqu · 2021-06-03T07:03:14Z

store_heartbeat_tick_interval is 10s by default. It's too large for disk stat. You can add a new tick type to do disk stat per second.

components/raftstore/src/store/peer.rs

components/server/src/server.rs

hicqu · 2021-06-07T12:14:42Z

tests/failpoints/cases/test_disk_full.rs

Better to do tests in Callback. In the current implementation, we can't know whether split fails or it just is waitting for executing.

hicqu · 2021-06-07T12:15:17Z

tests/failpoints/cases/test_disk_full.rs

Better to extract and test the error code.

hicqu · 2021-06-07T12:16:04Z

tests/failpoints/cases/test_disk_full.rs

Better to rename leader, follower_1 to peer_1, peer_2, ...

hicqu · 2021-06-07T12:17:42Z

tests/failpoints/cases/test_disk_full.rs

Why not call unwrap to simplify the code?

Signed-off-by: tier-cap <[email protected]>

NingLin-P

Rest LGTM

Also, I found this PR didn't cover snapshot generating and receiving, those could also consume disk space.

NingLin-P · 2021-06-22T13:37:22Z

components/raftstore/src/store/fsm/peer.rs

+        let msg_type = msg.get_message().get_msg_type();
+        let store_id = self.ctx.store_id();
+
+        if disk::disk_full_precheck(store_id) || disk::is_disk_full() {


Checking an atomic for every raft message and proposal may bring extra cost.

Maybe we can reduce the cost by only check the atomic once for each batch of peer, more specifically setting a flag like poller_ctx.is_disk_full at PollHandler::begin and then checking the poller_ctx.is_disk_full for each raft message and proposal instead.

This may bring little delay to detect disk full but I think the delay will be much shorter than the disk stats monitor interval (1s).

Also, I found this PR didn't cover snapshot generating and receiving, those could also consume disk space.

Yes, currently, these are also not limited in this PR: snapshot, running log, rocksdb compaction and wal, etc.
Maybe a good strategy is to solve the main problem first, and then observe the online status.

Checking an atomic for every raft message and proposal may bring extra cost.

Maybe we can reduce the cost by only check the atomic once for each batch of peer, more specifically setting a flag like poller_ctx.is_disk_full at PollHandler::begin and then checking the poller_ctx.is_disk_full for each raft message and proposal instead.

This may bring little delay to detect disk full but I think the delay will be much shorter than the disk stats monitor interval (1s).

The impact of access to atomic variables on performance should not be so big, we can add a performance test observation.

Not just performance but also resource cost (like CPU), the impact may not big enough to be noticeable but I think it is an unnecessary cost though.

Signed-off-by: tier-cap <[email protected]>

NingLin-P · 2021-06-23T06:30:48Z

components/tikv_util/src/sys/disk.rs

+pub fn is_disk_full() -> bool {
+    DISK_FULL.load(Ordering::Acquire)
+}
+
+pub fn disk_full_precheck(_store_id: u64) -> bool {
+    fail_point!("disk_full_peer_1", _store_id == 1, |_| true);
+    fail_point!("disk_full_peer_2", _store_id == 2, |_| true);
+    false
+}


Looks like disk_full_precheck is always use as disk_full_precheck || is_disk_full, how about combining these two into one like:

pub fn is_disk_full() -> bool { fail_point!("disk_full_peer_1", _store_id == 1, |_| true); fail_point!("disk_full_peer_2", _store_id == 2, |_| true); DISK_FULL.load(Ordering::Acquire) }

Maybe not. Because disk_full_precheck only used to ut, if we merge both, then the disk stats worker will clean it up periodically.

And there will be a more obscure logic, that is, in other places where precheck is not needed, you have to pass a store_id to this merged function.

NingLin-P

LGTM

ti-chi-bot · 2021-06-23T07:02:06Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

NingLin-P
hicqu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

tier-cap · 2021-06-23T07:06:56Z

/test

tier-cap · 2021-06-23T07:07:31Z

/run-integration-common-test

hicqu · 2021-06-23T07:30:28Z

/merge

ti-chi-bot · 2021-06-23T07:30:29Z

@hicqu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2021-06-23T07:30:31Z

This pull request has been accepted and is ready to merge.

Details

Commit hash: 3b8e470

hicqu · 2021-06-23T08:55:16Z

/merge

ti-chi-bot · 2021-06-23T08:55:16Z

@hicqu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2021-07-20T13:47:44Z

@tier-cap: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

* [feature] support read only and recovery when disk full. 1. when disk full, all the business write traffic will not be allowed. 2. no mather minority or majority or all servers disk full happen, you can recove by adding machines or storage, then service normally. Signed-off-by: tier-cap <[email protected]> * [feature] support read only and recovery when disk full. adjust impl on reviews. Signed-off-by: tier-cap <[email protected]> * [feature] support read only and recovery when disk full. opt one test case taking long time prbs. Signed-off-by: tier-cap <[email protected]> * [feature] support read only and recovery when disk full. change ut impl on reviews. Signed-off-by: tier-cap <[email protected]> * [feature] support read only and recovery when disk full. change ut impl on reviews. Signed-off-by: tier-cap <[email protected]> * [feature] support read only and recovery when disk full. remote the unused code and commend Signed-off-by: tier-cap <[email protected]> * clean up tests and fix a bug about transfer leader Signed-off-by: qupeng <[email protected]> * a little fix Signed-off-by: qupeng <[email protected]> * disk full recovery change 3 details 1. config change allowed 2. campaign success log allowed 3. add the raft engine size stats. Signed-off-by: tier-cap <[email protected]> * fix one bug Signed-off-by: tier-cap <[email protected]> * change details by review comment. Signed-off-by: tier-cap <[email protected]> * simplify some impls by comments. Signed-off-by: tier-cap <[email protected]> * change the atomic var access mode by review comments Signed-off-by: tier-cap <[email protected]> Co-authored-by: tier-cap <[email protected]> Co-authored-by: qupeng <[email protected]> Signed-off-by: tiancaiamao <[email protected]>

ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 28, 2021

tier-cap changed the title ~~[TiKV] support read only and recovery when disk full.~~ *: support read only and recovery when disk full May 28, 2021