view: revert cleanup filter that doesn't work with tablets#16670
view: revert cleanup filter that doesn't work with tablets#16670scylladb-promoter merged 2 commits intoscylladb:masterfrom
Conversation
🔴 CI State: FAILURE❌ - Build Build Details:
|
|
The self-contained-header check for dht/partition_filter.hh failed. I'll need to fix it and push a new version. |
🔴 CI State: FAILURE✅ - Build Failed Tests (6/23683):
Build Details:
|
Known flaky test: #16420 :-( |
🟢 CI State: SUCCESS✅ - Build Build Details:
|
|
@bhalevy please review, as it is your code which I'm reverting here. |
|
This PR title is confusing since it does not fix the dtest code, but rather it fixes scylla so the dtest will stop failing. |
denesb
left a comment
There was a problem hiding this comment.
Maybe we need a test which confirms that MV update generation indeed ignores data not owned by the current node.
| // have had other updates. | ||
| std::move(sstables.begin(), sstables.end(), std::back_inserter(_sstables_with_tables[t])); | ||
| // Sleep a bit, to avoid a tight loop repeatedly spamming the log with the same message. | ||
| seastar::sleep(std::chrono::seconds(1)).get(); |
There was a problem hiding this comment.
Maybe not in this PR, but we should make this exponential back-off, with a ceiling of say 1 minute.
There was a problem hiding this comment.
I don't know if it's important. Basically, if the generator fails after a long time (e.g., after working for 10 minutes), it doesn't matter if we sleep for 1 second or 60 seconds. If it fails after a long time (in this case, it failed after a fraction of a millisecond), again it doesn't matter if we sleep for 1 or 60 seconds - both are negligible performance-wise, and in both the log spam as at a manageable level.
| // have had other updates. | ||
| std::move(sstables.begin(), sstables.end(), std::back_inserter(_sstables_with_tables[t])); | ||
| // Sleep a bit, to avoid a tight loop repeatedly spamming the log with the same message. | ||
| seastar::sleep(std::chrono::seconds(1)).get(); |
There was a problem hiding this comment.
I think that @kostja wanted to standardize on 0.1 seconds (100ms) sleeps.
Regarding the log spam, shouldn't we use logger::rate_limit?
That said, if you think that 1s sleep is more appropriate for this use case, I'm good with that.
There was a problem hiding this comment.
Where did he want to standardized this way?
In my response to Botond above, I tried to explain why I think it doesn't matter what the sleep is (within reason), I don't think it needs to be smart or even standardized. It's not something that a user will ever encounter (it's not a user-facing timeout or delay or anything). The only purpose of this sleep is to reduce log spam - from a million lines per second to just 1 per second - 0.1 or 10 per second would also be just as acceptable.
| inject_failure("view_update_generator_consume_staging_sstable"); | ||
| auto result = staging_sstable_reader.consume_in_thread(view_updating_consumer(*this, s, std::move(permit), *t, sstables, _as, staging_sstable_reader_handle), | ||
| dht::incremental_owned_ranges_checker::make_partition_filter(_db.get_keyspace_local_ranges(s->ks_name()))); | ||
| auto result = staging_sstable_reader.consume_in_thread(view_updating_consumer(*this, s, std::move(permit), *t, sstables, _as, staging_sstable_reader_handle)); |
There was a problem hiding this comment.
nit: s/with tables/with tablets/ in commit message.
We already have a bunch of dtests that check multi-node MV operations. The specific dtest that this PR fixes is one of these tests - it involves view building and concurrent range ownership movements (tablets moving around). To be honest, if I thought we had a problem, I would write a new test to confirm it, but in this case I don't think we have a problem: If a base replica does not own a range, it simply has no way to send the view update to anyone - if it's the Nth base replica it sends the update to the Nth view replica, but if it's not a base replica at all, it has noone to send anything to. So I don't think such a problem can even exist. |
The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue scylladb#16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs scylladb#16598. Signed-off-by: Nadav Har'El <[email protected]>
The title of the actual patch (the second patch) was exactly like you wanted: "view: revert cleanup filter that doesn't work with tablets". I just had a silly title for the merge commit. I'll change it. |
This patch reverts commit 10f8f13 from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes scylladb#16598 Signed-off-by: Nadav Har'El <[email protected]>
|
Rebased, and fixed typo in one commit message and the title of the pull request. I did not change the 1-second sleep in the first patch. This first patch wasn't even the main point of this PR - I just had a hard time debugging this issue because before the fix in the second patch I would get millions of log messages which completely overwhelmed test.py, so the first patch is just to make this managable - failure in view_update_generator will now happen just once a second, not a million times a second. |
🔴 CI State: FAILURE✅ - Build Failed Tests (2/23733):Build Details:
|
This test didn't change in this PR. It seems to be flakiness in the topology test framework, in the concurrent server creation function I'll retry the CI. You can see the failure in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5759/testReport/junit/(root)/non-boost%20tests/Tests___Unit_Tests___topology_experimental_raft_test_mv_tablets_debug_1/ : |
🟢 CI State: SUCCESS✅ - Build Build Details:
|
There are no known issues with this that haven't been fixed already The failure report points to one of the servers failing to boot Logs of this server: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5759/artifact/testlog/x86_64/debug/scylla-1410.log
Curious that it happened when repairing Node 127.54.45.48 (https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5759/artifact/testlog/x86_64/debug/scylla-1409.log) didn't report anything interesting in its logs during this repair time. |
|
I also recommend in general doing a shallow 5 minute investigation (like I did above) before making a conclusion that this is a known issue --- there can be trillions of different reasons why bootstrapping a server might fail, and different root causes should be different issues IMO ("adding servers fails" is too generic). I haven't seen this |
|
You're making a mountain of a molehill. I was trying to babysit this PR which contains a real fix which really is about work assigned to me, not debug random unrelated issues in areas of the code I'm not familiar. I actually did spend 5 minutes and found that add_servers failure and "showed my work" above. This fit with what you told me about a known bug in add_servers (which I didn't know had already been fixed). The insistence of topology tests developers to not put all the evidence of failures in a single file (which I insisted on when i wrote test/cql-pytest/run) meant I obviously missed some of the evidence in other files, which it seems you are more experienced at finding than I was. I don't think I should be executed for that :-) |
|
Opened a new issue #16821 on the CI failure that we saw above (and @kbr-scylla analyzed). |
The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass.
The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place.
Fixes #16598