Skip to content

Conversation

@tier-cap
Copy link
Collaborator

@tier-cap tier-cap commented May 28, 2021

What problem does this PR solve?

closes #10597

Solve the availability problem when the disk is full, no mather one or two or the majority or all.

What is changed and how it works?

  1. Regularly monitor disk usage
  2. Intercept business write traffic when the disk threshold is triggered
  3. Space reclamation relies on PD for free space reclamation logic

Check List

Test

Release note

  • not available

@ti-chi-bot
Copy link
Member

@tier-cap: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels May 28, 2021
@ti-chi-bot
Copy link
Member

Welcome @tier-cap!

It looks like this is your first PR to tikv/tikv 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/tikv. 😃

@ti-chi-bot ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 28, 2021
@tier-cap tier-cap changed the title [TiKV] support read only and recovery when disk full. *: support read only and recovery when disk full May 28, 2021
@ti-chi-bot ti-chi-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 31, 2021
@hicqu
Copy link
Contributor

hicqu commented Jun 3, 2021

store_heartbeat_tick_interval is 10s by default. It's too large for disk stat. You can add a new tick type to do disk stat per second.

@tier-cap tier-cap requested a review from hicqu June 3, 2021 11:34
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2021
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 7, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to do tests in Callback. In the current implementation, we can't know whether split fails or it just is waitting for executing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to extract and test the error code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to rename leader, follower_1 to peer_1, peer_2, ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call unwrap to simplify the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2021
Copy link
Member

@NingLin-P NingLin-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Also, I found this PR didn't cover snapshot generating and receiving, those could also consume disk space.

let msg_type = msg.get_message().get_msg_type();
let store_id = self.ctx.store_id();

if disk::disk_full_precheck(store_id) || disk::is_disk_full() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking an atomic for every raft message and proposal may bring extra cost.

Maybe we can reduce the cost by only check the atomic once for each batch of peer, more specifically setting a flag like poller_ctx.is_disk_full at PollHandler::begin and then checking the poller_ctx.is_disk_full for each raft message and proposal instead.

This may bring little delay to detect disk full but I think the delay will be much shorter than the disk stats monitor interval (1s).

Copy link
Collaborator Author

@tier-cap tier-cap Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I found this PR didn't cover snapshot generating and receiving, those could also consume disk space.

Yes, currently, these are also not limited in this PR: snapshot, running log, rocksdb compaction and wal, etc.
Maybe a good strategy is to solve the main problem first, and then observe the online status.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking an atomic for every raft message and proposal may bring extra cost.

Maybe we can reduce the cost by only check the atomic once for each batch of peer, more specifically setting a flag like poller_ctx.is_disk_full at PollHandler::begin and then checking the poller_ctx.is_disk_full for each raft message and proposal instead.

This may bring little delay to detect disk full but I think the delay will be much shorter than the disk stats monitor interval (1s).

The impact of access to atomic variables on performance should not be so big, we can add a performance test observation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just performance but also resource cost (like CPU), the impact may not big enough to be noticeable but I think it is an unnecessary cost though.

@tier-cap tier-cap requested review from NingLin-P and hicqu June 23, 2021 03:34
Comment on lines +14 to +22
pub fn is_disk_full() -> bool {
DISK_FULL.load(Ordering::Acquire)
}

pub fn disk_full_precheck(_store_id: u64) -> bool {
fail_point!("disk_full_peer_1", _store_id == 1, |_| true);
fail_point!("disk_full_peer_2", _store_id == 2, |_| true);
false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like disk_full_precheck is always use as disk_full_precheck || is_disk_full, how about combining these two into one like:

pub fn is_disk_full() -> bool {
    fail_point!("disk_full_peer_1", _store_id == 1, |_| true);
    fail_point!("disk_full_peer_2", _store_id == 2, |_| true);
    DISK_FULL.load(Ordering::Acquire)
}

Copy link
Collaborator Author

@tier-cap tier-cap Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. Because disk_full_precheck only used to ut, if we merge both, then the disk stats worker will clean it up periodically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there will be a more obscure logic, that is, in other places where precheck is not needed, you have to pass a store_id to this merged function.

Copy link
Member

@NingLin-P NingLin-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • NingLin-P
  • hicqu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 23, 2021
@tier-cap
Copy link
Collaborator Author

/test

@tier-cap
Copy link
Collaborator Author

/run-integration-common-test

@hicqu
Copy link
Contributor

hicqu commented Jun 23, 2021

/merge

@ti-chi-bot
Copy link
Member

@hicqu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 3b8e470

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 23, 2021
@hicqu hicqu removed the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jun 23, 2021
@hicqu
Copy link
Contributor

hicqu commented Jun 23, 2021

/merge

@ti-chi-bot
Copy link
Member

@hicqu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 46e539a into tikv:master Jun 23, 2021
@tier-cap tier-cap mentioned this pull request Jul 9, 2021
22 tasks
@ti-chi-bot
Copy link
Member

@tier-cap: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jul 20, 2021
tiancaiamao pushed a commit to tiancaiamao/tikv that referenced this pull request Aug 11, 2021
* [feature] support read only and recovery when disk full.

1. when disk full, all the business write traffic will not be allowed.
2. no mather minority or majority or all servers disk full happen, you
	can recove by adding machines or storage, then service normally.

Signed-off-by: tier-cap <[email protected]>

* [feature] support read only and recovery when disk full.

adjust impl on reviews.

Signed-off-by: tier-cap <[email protected]>

* [feature] support read only and recovery when disk full.

opt one test case taking long time prbs.

Signed-off-by: tier-cap <[email protected]>

* [feature] support read only and recovery when disk full.

change ut impl on reviews.

Signed-off-by: tier-cap <[email protected]>

* [feature] support read only and recovery when disk full.

change ut impl on reviews.

Signed-off-by: tier-cap <[email protected]>

* [feature] support read only and recovery when disk full.

remote the unused code and commend

Signed-off-by: tier-cap <[email protected]>

* clean up tests and fix a bug about transfer leader

Signed-off-by: qupeng <[email protected]>

* a little fix

Signed-off-by: qupeng <[email protected]>

* disk full recovery change 3 details
1. config change allowed
2. campaign success log allowed
3. add the raft engine size stats.

Signed-off-by: tier-cap <[email protected]>

* fix one bug

Signed-off-by: tier-cap <[email protected]>

* change details by review comment.

Signed-off-by: tier-cap <[email protected]>

* simplify some impls by comments.

Signed-off-by: tier-cap <[email protected]>

* change the atomic var access mode by review comments

Signed-off-by: tier-cap <[email protected]>

Co-authored-by: tier-cap <[email protected]>
Co-authored-by: qupeng <[email protected]>
Signed-off-by: tiancaiamao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiKV won't panic for disk full

5 participants