Add support for tablets in Alternator by nyh · Pull Request #16353 · scylladb/scylladb

nyh · 2023-12-10T13:21:44Z

The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views) to work.

After this series support for tablets in Alternator mostly work, but not completely:

CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams").
Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace.

Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight.

Fixes #16203
Fixes #16313

scylladb-promoter · 2023-12-10T15:49:51Z

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 2 hr 27 min
Builder: spider4.cloudius-systems.com

avikivity · 2023-12-10T17:09:00Z

+    auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts, tags_map);
    try {
-        schema_mutations = service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);
+       schema_mutations = service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);


This is a code style regression

Sorry, I don't know what happened here, I'll fix it.

disk corruption?

More likely, a cat on my keyboard + me too lazy to read the patch before sending it :-(

avikivity · 2023-12-10T17:09:46Z

    auto opts = get_network_topology_options(sp, gossiper, rf);

+    // Tablets are not yet enabled by default on Alternator tables, because of
+    // missing support for CDC (see issue #16313). Until then, allow


Because they're an experimental feature, too, and incomplete.

When I wrote "not enabled by default", I meant not even if the tablets experimental features is turned on. Even with the tablets experimental feature, I didn't want to enable tablets on Alternator by default. By the way, the same is currently the case in CQL - it's not enough to enable the experimental feature, you need to enable it per table.

avikivity · 2023-12-10T17:16:50Z

-                    tracing::trace(tr_state, "Failed to apply view update for {} and {} remote endpoints",
+            // The "delay_before_remote_view_update" injection point can be
+            // used to add a short delay (currently 0.5 seconds) before a base
+            // replica sends its update to the remove view replica.


Isn't using a delay asking for a failure on a busy system? In the case of tests, probably false negatives.

Probably, such a fault injection point should pause to coordinate on some named http endpoint. For example:

test setup registers an http endpoint (provided by the test script) with the API server

inject() calls the http endpoint with the input string

the http server does something and returns a response

based on the response, the injector continues normally or throws an exception

This allows the test system to inject any action, not just a delay.

We won't do this now, but maybe we'll find the time to reorganize this one day.

You're absolutely right - a fixed delay may add false negatives - if the test machine is so slow that it can't perform a single read in 0.5 seconds, it may miss a bug. But if there is a bug, in some of the runs it won't be missed, and that's enough. Moreover, during the development process - when I wanted to reproduced the bug - I ran it on an idle machine.

Please note that this test really needs a finite delay - and not a very long delay or an infinite pause:

When synchronous_update=false (the bug), the delay delays the view updates, and the immediate read doesn't see the data.

When synchronous_update=true (the fix), the delay delays the view updates, but the write waits for it, and when the write returns, it sees the data.

Maybe there's a better way to do it, but I didn't think of a different approach that is reasonable. I also tried to work within what the injection framework (that is the first time I ever used...) gives me and not to reinvent it.

nyh · 2023-12-11T14:40:58Z

#16364 conflicts with this PR - after that other PR the way to enable or disable tablets will be "tablets: true" / "tablets: false", not via initial_tablets. So if that PR goes in first, I'll need to redo parts of this PR.

wmitros

Looks good to me, however, reading this patch set has been mostly a learning experience for me as I'm not that familiar with the Alternator implementation, and the changes were good for that as they included an abundance of comments.

eliransin

I also think that eventually our injection should be more precise that just injecting some delays but it is not something for now 🙂

eliransin · 2023-12-13T10:45:15Z

+# an async wrapper to the boto3 functions ourselves (e.g., run them in a
+# separate thread) ourselves.
+@pytest.mark.asyncio
+async def test_tablet_alternator_lsi_consistency(manager: ManagerClient):


Is there any point running this on release mode?
In release mode the injection will not do anything so it is kind of pointless to run it (unless you are counting on accidentally catching this bug without the delay.

I am not strongly opinionated about this so maybe we can mark this test (and probably should do it for all other tests that uses injection) somehow in a follow up.

@eliransin I don't know that injection is ignored in release mode. Why?
It means that any test that uses injection cannot be run on release mode - just on dev mode. Are we fine with that? CC @avikivity

The way I would do this in the cql-pytest framework is to skip a test when it cannot be done on this build mode - the inject function can detect it's not supported and skip the calling test - with no need for any separate marks. I can do this here too but am still trying to figure out the conventions of the topology test framework. Note that my absolute last preference would be to manually list this test in some yaml file of excluded tests as I've seen people doing - I think this approach is a mistake.

@nyh This is how you skip a test in release mode:

@pytest.mark.asyncio @skip_mode('release', 'error injections are not supported in release mode') async def test_table_dropped_during_streaming(manager: ManagerClient):

If you skip the test before invoking the injection API we save some run time because we avoid cluster bootstrap.

Thanks, I'll do this, though I don't love the need to remember to add this marker.
I still have an uneasy feeling about the principle though - just a few days we had a big discussion about needed more "testability", more injection points and so on that will make it easier to write reliable and shorter tests. But if we say that these injection points don't work in release mode, isn't this a problem? Doesn't it mean we should avoid using injection points in tests, or at least most tests, so that we'll still have most testings be able to run on "release mode" (which is the most important binary to be testing!!!!)
CC @avikivity I wonder what your thoughts on this are.

I don't see a problem. We're testing the algorithm here.

There's a chance that there's an additional bug that is hidden by both dev and debug modes and is exposed by release mode, but the probability is low, and together with the low probability of the injected failure happening in production, I think it's okay.

I do this skip.

tgrabiec · 2023-12-13T12:35:49Z

I also think that eventually our injection should be more precise that just injecting some delays but it is not something for now 🙂

It's possible to control when the injection should stop sleeping from the test, see 7d0f4c1. Injection points can wait on messages.

nyh · 2023-12-13T12:54:03Z

I also think that eventually our injection should be more precise that just injecting some delays but it is not something for now 🙂

It's possible to control when the injection should stop sleeping from the test, see 7d0f4c1. Injection points can wait on messages.

I don't think it's relevant to this test, unfortunately. The whole point of this LSI consistency check is to verify that if the view update takes a long time (e.g., 0.5 seconds) then the base write waits for it (i.e., it is a "synchronous view update"). So we need the view update to actually wait for some time - I don't know if we should use 0.5 seconds or 50 seconds, but it needs to be some sort of fixed wait. It's not that the test needs to tell the server when to stop sleeping - the server doesn't know when it wants to stop sleeping.

Again I want to emphasize that a short sleep is important for keeping the test short, and the only downside of a too-short sleep is that it can cause a false-negative - i.e., a failed test will not be recognized on a very slow and overloaded test machine because it took more than 0.5 seconds for the test to even try to read. Even in that case, if the code is buggy, it will sometimes fail (if the machine happens to not be super-slow), which may be good enough. I can increase 0.5 seconds to 30 seconds, but it means that a successful test will always take 30 seconds. In dtest I wouldn't think twice about this, but here I do.

nuivall · 2023-12-13T15:57:34Z

-            view_ptr(view_builder.build()), ts, true, schema_mutations);
+            view, ts, true, schema_mutations);
+        // add_table_or_view_to_schema_mutation() is a low-level function that
+        // doesn't call the callbacks that prepare_new_view_announcement()


is there a reason to not use prepare_new_view_announcement here?

Unfortunately, yes. Each of these functions makes some very specific assumptions - for example prepare_new_view_announcement works like CREATE VIEW - adding a view to an existing table. Alternator creates a keyspace, base-table and view all in one operation.

nuivall · 2023-12-13T16:10:57Z

+            // LSIs have no tags, but Scylla's "synchronous_updates" feature
+            // (which an LSIs need), is actually implemented as a tag so we
+            // need to add it here:
+             std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};


nit: extra space before std?

Eeek. You're right. I'll fix that.

nuivall · 2023-12-13T16:12:22Z

+            // (which an LSIs need), is actually implemented as a tag so we
+            // need to add it here:
+             std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};
+            view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>(tags_map));


is tags_map copied here? maybe move or construct in place

The tags_extension constructor takes a const std::map<sstring, sstring>& tags so it copies, indeed. I'll try to remember to follow this up with a fix for this class and all its callers to avoid this copy, although I think this is completely negligible - we're talking here about one of the slowest operations in Scylla - creating a table with a view (it's even worse in DynamoDB - taking many seconds), copying an in-memory map with one item isn't a large part of it.

nuivall · 2023-12-13T16:19:48Z

+
+# Alternator convenience function for fetching the entire result set of a
+# query into an array of items.
+def full_query(table, ConsistentRead=True, **kwargs):


wouldn't be better to import those functions from alternator/util? deduplication later is harder, especially in case the code diverges

I would vote no. We should minimize the number of convenience functions, and it's not always possible to share them between suites because of completely different assumptions and coding styles.
Sometimes copying 7 lines of code is better than turning the world up-side-down just to be able to reuse a 7 line function. It's also much easier for reviewers and future readers of this test to see in front of them the 7-line convenience function, instead of trying to inspect a huge library what this function really does (if you worked on dtest, you'll know what I mean and what I'm trying to avoid).

Duplicating functions doesn't minimize it's number unless you count differently. My main point is that different suites should not be treated as separate projects, that's unfortunate. The coding style should converge instead of allowing more more differgence.

I meant we should minimize the number of global convenience functions. I don't know how many dtests you wrote, but their global "convenience" functions are everything that I want to avoid: I never understand them when I read a test, there are a hundred of them and each one takes a hundred different parameters. It's something I don't want to duplicate in my own tests.

I don't know if "suites should be treated as separate projects" but in reality they are. And to be honest, if I need to choose between duplicating a 7-line function and enforcing irrelevant standards across the board (e.g., you must use async in test/alternator because we wanted to use it in test/topology!), I prefer duplicating a 7-line function.

Specifically, I noted above in the comment that according to the test/topology_experimental_raft coding standard, these Alternator tests will eventually need to be moved to using some sort of async library, not boto3. This full_query implementation will need to become async as well. The one in Alternator won't. I don't want to make 800 tests in test/alternator harder to read and write just to make 7 tests in test/topology_experimental_raft happy.

Finally, my philosophy of these convenience functions is that they should be very short and very simple - the exact opposite of dtest's functions that take 10 different parameters to suite 10 different needs. Such short and simple functions should not "diverge" over time. They shouldn't even change (or change very rarely) - my whole point is that I don't want people to start adding more and more features to the existing functions.

The problem you're describing falls more into good library api design category than duplicate vs code reuse. Comparing bad design to different bad design doesn't prove much in absolute terms. Agree that one should not create functions like
def full_scan(table, ConsistentRead=True, Count=0, WithRestart=false, OnlyEven=true, **kwargs)
but something like
def full_scan(table)
def full_scan_and_count(table)
but that's orthogonal to code duplication. It's not very bad though as you said function is very small. Just for the future I'd expect alternator part of topology tests to grow - ideally alternator should have some more system testing not only single node api tests.

I am not fully grasping async thing in python, mostly I think it's a distraction currently. I don't see why we should have both async and non async functions (see coroutines vs continuations pattern). Either this or that, mixed state should be seen as temporary not the goal.

Let's reconsider the library issue when it's clearer what the library should contain. My experience with dtest's collection of convenience functions is terrible (I don't think that this review is the place to discuss it), whereas in cql-pytest it mostly worked because the "library" is so small, deliberately.

Regarding async - the topology guys decided to use it, but it's not used in test/alternator. So I don't understand what you are proposing. Should we edit 800 Alternator test functions, almost 20,000 lines of code, to make everything async? Just so we could reuse a 7-line function (or maybe five such functions) between the projects? I'm not convinced.

nuivall · 2023-12-13T16:32:26Z

so in theory there is no test checking if enabling tablets is doing anything in alternator?

nuivall

LGTM (haven't looked closely at db/view/*)

As explained in issue scylladb#16203, we cannot yet enable tablets on Alternator keyspaces by default, because support for some of the features that Alternator needs, such as CDC, is not yet available. Nevertheless, to start testing Alternator integration with tablets, we want to provide a way to enable tablets in Alternator for tests. In this patch we add support for a tag, 'experimental:initial_tablets', which if added on a table during creation, uses tablets for its keyspace. The value of this tag is a numeric string, and it is exactly analogous to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy. We name this tag with the "experimental:" prefix to emphesize that it is experimental, and the way to enable or disable tablets will probably change later. The new tag only has effect when added while *creating* a table. Adding, deleting or changing it later on an existing table will have no effect. A later patch will have tests that use this tag to test Alternator with tablets. Refs scylladb#16203. Signed-off-by: Nadav Har'El <[email protected]>

In commit 88a5dda, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. We have the same problem in Alternator when creating a view (GSI or LSI). The Alternator code does not use prepare_new_view_announcement(), and instead uses the lower-level function add_table_or_view_to_schema_mutation() so it didn't get the call to the notifier, so we must add it here too. Before this patch, creating an Alternator table with tablets (which has become possible after the previous patch) fails with "Tablet map not found for table <uuid>". With this patch, it works. A test for materialized views in Alternator will come in a following patch, and will test everything together - the CreateTable tag to use tablets (from the previous patch), the LSI/GSI creation (fixed in this patch) and the correct consistency of the LSI (fixed in the next patch). Signed-off-by: Nadav Har'El <[email protected]>

nyh · 2023-12-17T18:13:16Z

so in theory there is no test checking if enabling tablets is doing anything in alternator?

Good question. In the later patches, I use the tablet-enabling tag to enable tablets and then (before fixing the bugs) obvious bugs appeared, so I know that this tag had an effect. However, you're right that we should have a regression test that the tablet-enabling tag actually does something and continues to do something (and isn't, say, just outright ignored).

I'll add such a test in a followup PR.

DynamoDB's *local* secondary index (LSI) allows strongly-consistent reads from the materialized view, which must be able to read what was previously written to the base. To support this, we need the view to use the "synchronous_updates". Previously, with vnodes, there was no need for using this option explicitly, because an LSI has the same partition key as the base table so the base and view replicas are the same, and the local writes are done synchronously. But with tablets, this changes - there is no longer a guarantee that the base and view tablets are located on the same node. So to restore the strong consistency of LSIs when tablets are enabled, this patch explicitly adds the "synchronous_updates" option to views created by Alternator LSIs. We do *not* add this option for GSIs - those do not support strongly-consistent reads. This fix was tested by a test that will be introduced in the following patches. The test showed that before this patch, it was possible that reading with ConsistentRead=True from an LSI right after the base was written would miss the new changes, but after this patch, it always sees the new data in the LSI. Fixes scylladb#16313. Signed-off-by: Nadav Har'El <[email protected]>

It's difficult to write a test (as we plan to do in to in the next patch) that verifies that synchronous view updates are indeed synchronous, i.e., that write with CL=QUORUM on the base-table write returns only after CL=QUORUM was also achieved in the view table. The difficulty is that in a fast test machine, even if the synchronous-view-update is completely buggy, it's likely that by the time the test reads from the view, all view updates will have been completed anyway. So in this patch we introduce an injection point, for testing, named "delay_before_remote_view_update", which adds a delay before the base replica sends its update to the remote view replica (in case the view replica is indeed remote). As usual, this injection point isn't configurable - when enabled it adds a fixed (0.5 second) delay, on all view updates on all tables. The existing code used continuation-style Seastar programming, and the addition of the injection point in this patch made it even uglier, so in the next patch we will coroutine-ize this code. Signed-off-by: Nadav Har'El <[email protected]>

In the previous patch we added a delay injection point (for testing) in the view update code. Because the code was using continuation style, this resulted in increased indentation and ugly repetition of captures. So in this patch we coroutinize the code that waits for remote view updates, making it simpler, shorter, and less indented. Note that this function still uses continuations in one place: The remote view update is still composed of two steps that need to happen one after another, but we don't necessarily need to wait for them to happen. This is easiest to do with chaining continuations, and then either waiting or not waiting for the resulting future. Signed-off-by: Nadav Har'El <[email protected]>

nyh · 2023-12-17T18:19:45Z

Pushed a new version of this PR, addressing a few minor comments but not all of them (see my replies to your comments).

I already have a house of cards built on top of this PR (e.g., fixing secondary indexes on tablets) so would like to get this PR in to build on it, and will continue to improve using some more suggestions from this review.

Please note that in particular, the test I have here for Alternator LSI consistency is awful, and depending on a 10-node cluster's random tablet placement to reproduce - sometimes - the inconsistency. In followup patches (for CQL LSIs) I developed a much better test setup, using just 2 nodes and deliberately moving tablets (so the base's tablets are on one node, the view's on the other nodes) for 100% reliable test failure, so later I'll send a followup to this PR to replace its test by a better one. But first I'd like to get this PR in before I get a stack overflow in my brain :-)

scylladb-promoter · 2023-12-17T20:10:29Z

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 1 hr 52 min
Builder: spider5.cloudius-systems.com

nyh · 2023-12-18T13:06:54Z

so in theory there is no test checking if enabling tablets is doing anything in alternator?

Good question. In the later patches, I use the tablet-enabling tag to enable tablets and then (before fixing the bugs) obvious bugs appeared, so I know that this tag had an effect. However, you're right that we should have a regression test that the tablet-enabling tag actually does something and continues to do something (and isn't, say, just outright ignored).

I'll add such a test in a followup PR.

In a followup patch f4b21d9 I rewrite the Alternator LSI test, and as part of it I verify that the test really sees that the Alternator table really uses tablets (because the test would trivially pass before tablets).

nyh · 2023-12-18T13:09:03Z

Hi @scylladb/scylla-maint @tgrabiec there is a lot of discussion here but two positive reviews and no negative reviews so please consider merging this PR - I already have several newer PRs for tablets+secondary-index support waiting to be rebased on this one.

The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views) to work. After this series support for tablets in Alternator _mostly_ work, but not completely: 1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams"). 2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace. Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight. Fixes #16203 Fixes #16313 Closes #16353 * github.com:scylladb/scylladb: mv, tablets, alternator: test for Alternator LSI with tablets mv: coroutinize wait code for remote view updates mv, test: add injection point to delay remove view update alternator: explicitly request synchronous updates for LSI alternator: fix view creation when using tablets alternator: add experimental method to create a table with tablets

kbr-scylla · 2023-12-19T10:52:07Z

Dequeued, causes next promotion failures
https://jenkins.scylladb.com/job/scylla-master/job/next/6963/artifact/testlog/aarch64/debug/topology_experimental_raft.test_mv_tablets.1.log

=================================== FAILURES ===================================
____________________ test_tablet_alternator_lsi_consistency ____________________

self = <test.pylib.manager_client.ManagerClient object at 0xffff05231910>
servers_num = 10, cmdline = None
config = {'alternator_port': 8000, 'alternator_write_isolation': 'only_rmw_uses_lwt'}
property_file = None, start = True, expected_error = None

    async def servers_add(self, servers_num: int = 1,
                          cmdline: Optional[List[str]] = None,
                          config: Optional[dict[str, Any]] = None,
                          property_file: Optional[dict[str, Any]] = None,
                          start: bool = True,
                          expected_error: Optional[str] = None) -> [ServerInfo]:
        """Add new servers concurrently"""
        assert servers_num > 0, f"servers_add: cannot add {servers_num} servers, servers_num must be positive"
    
        try:
            data = self._create_server_add_data(None, cmdline, config, property_file, start, expected_error)
            data['servers_num'] = servers_num
>           server_infos = await self.client.put_json("/cluster/addservers", data, response_type="json",
                                                      timeout=ScyllaServer.TOPOLOGY_TIMEOUT * servers_num)

test/pylib/manager_client.py:266: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test.pylib.rest_client.UnixRESTClient object at 0xffff057c1510>
resource_uri = '/cluster/addservers'
data = {'config': {'alternator_port': 8000, 'alternator_write_isolation': 'only_rmw_uses_lwt'}, 'servers_num': 10, 'start': True}
host = None, port = None, params = None, response_type = 'json', timeout = 10000

    async def put_json(self, resource_uri: str, data: Optional[Mapping] = None, host: Optional[str] = None,
                       port: Optional[int] = None, params: Optional[dict[str, str]] = None,
                       response_type: Optional[str] = None, timeout: Optional[float] = None) -> Any:
>       ret = await self._fetch("PUT", resource_uri, response_type = response_type, host = host,
                                port = port, params = params, json = data, timeout = timeout)

test/pylib/rest_client.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test.pylib.rest_client.UnixRESTClient object at 0xffff057c1510>
method = 'PUT', resource = '/cluster/addservers', response_type = 'json'
host = None, port = None, params = None
json = {'config': {'alternator_port': 8000, 'alternator_write_isolation': 'only_rmw_uses_lwt'}, 'servers_num': 10, 'start': True}
timeout = 10000

    async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
                     host: Optional[str] = None, port: Optional[int] = None,
                     params: Optional[Mapping[str, str]] = None,
                     json: Optional[Mapping] = None, timeout: Optional[float] = None) -> Any:
        # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
        assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
        assert response_type is None or response_type in ["text", "json"], \
                f"Invalid response type requested {response_type} (expected 'text' or 'json')"
        # Build the URI
        port = port if port else self.default_port if hasattr(self, "default_port") else None
        port_str = f":{port}" if port else ""
        assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
                "{method} {resource}"
        host_str = host if host is not None else self.default_host
        uri = self.uri_scheme + "://" + host_str + port_str + resource
        logging.debug(f"RESTClient fetching {method} {uri}")
    
        client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
        async with request(method, uri,
                           connector = self.connector if hasattr(self, "connector") else None,
                           params = params, json = json, timeout = client_timeout) as resp:
            if resp.status != 200:
                text = await resp.text()
>               raise HTTPError(uri, resp.status, params, json, text)
E               test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/addservers, params: None, json: {'start': True, 'config': {'alternator_port': 8000, 'alternator_write_isolation': 'only_rmw_uses_lwt'}, 'servers_num': 10}, body:
E               failed to start the node, timeout reached, server_id 498, IP 127.58.46.2, workdir scylla-498, host_id 26143419-27e5-4467-aa5a-041969135711, cql [not connected]
E               Check the log files:
E               /scylladir/testlog/aarch64/test.py.debug-release-dev.log
E               /scylladir/testlog/aarch64/debug/scylla-498.log

test/pylib/rest_client.py:69: HTTPError

The above exception was the direct cause of the following exception:

manager = <test.pylib.manager_client.ManagerClient object at 0xffff05231910>

    @pytest.mark.asyncio
    @skip_mode('release', 'error injections are not supported in release mode')
    async def test_tablet_alternator_lsi_consistency(manager: ManagerClient):
        """A reproducer for a bug where Alternator LSI was not using synchronous
           view updates when tablets are enabled, which could cause strongly-
           consistent read of the LSI to miss the data just written to the base.
           We use a fairly large cluster to increase the chance that the tablet
           randomization results in non-local pairing of base and view tablets.
           We could make this test fail more reliably and only need 4 nodes if
           we had an API to control the tablet placement.
           We also use the "delay_before_remote_view_update" injection point
           to add a delay to the view update - without it it's almost impossible
           to reproduce this issue on a fast machine.
           Reproduces #16313.
        """
>       servers = await manager.servers_add(10, config=alternator_config)

test/topology_experimental_raft/test_mv_tablets.py:134: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test.pylib.manager_client.ManagerClient object at 0xffff05231910>
servers_num = 10, cmdline = None
config = {'alternator_port': 8000, 'alternator_write_isolation': 'only_rmw_uses_lwt'}
property_file = None, start = True, expected_error = None

    async def servers_add(self, servers_num: int = 1,
                          cmdline: Optional[List[str]] = None,
                          config: Optional[dict[str, Any]] = None,
                          property_file: Optional[dict[str, Any]] = None,
                          start: bool = True,
                          expected_error: Optional[str] = None) -> [ServerInfo]:
        """Add new servers concurrently"""
        assert servers_num > 0, f"servers_add: cannot add {servers_num} servers, servers_num must be positive"
    
        try:
            data = self._create_server_add_data(None, cmdline, config, property_file, start, expected_error)
            data['servers_num'] = servers_num
            server_infos = await self.client.put_json("/cluster/addservers", data, response_type="json",
                                                      timeout=ScyllaServer.TOPOLOGY_TIMEOUT * servers_num)
        except Exception as exc:
>           raise Exception("Failed to add servers") from exc
E           Exception: Failed to add servers

It looks like the test simply timed out waiting to bootstrap 10 nodes.
https://jenkins.scylladb.com/job/scylla-master/job/next/6963/artifact/testlog/aarch64/debug/scylla-498.log

WARN  2023-12-19 08:40:45,997 seastar - Seastar compiled with default allocator, --memory option won't take effect
...
INFO  2023-12-19 08:57:19,830 [shard 0: gms] storage_service - raft topology: streaming completed

This node was booting for 17 minutes. This included waiting for other nodes to boot, which took 14 minutes.

INFO  2023-12-19 08:41:48,118 [shard 0:strm] storage_service - raft topology: join: request to join placed, waiting for the response from the topology coordinator
INFO  2023-12-19 08:41:48,184 [shard 0:strm] raft_group0 - Server 26143419-27e5-4467-aa5a-041969135711 is starting group 0 with id 6042e7b0-9e4a-11ee-b539-40f2e8d7f197
INFO  2023-12-19 08:55:46,826 [shard 0:stmt] schema_tables - Creating keyspace system_auth

I'm not sure if we can afford having such long tests in test.py.

Another problem is that this test file already has multiple long test cases. They should be in separate test files or they cannot be run concurrently.

Regarding the failing test itself, the suggestion from comment makes sense to me

           We could make this test fail more reliably and only need 4 nodes if
           we had an API to control the tablet placement.

maybe we could also use an error injection instead of API?

tgrabiec · 2023-12-19T10:59:01Z

We could make this test fail more reliably and only need 4 nodes if we had an API to control the tablet placement.

We have an API to control tablet placement, you can move tablets with manager.api.move_tablet()

nyh · 2023-12-19T13:24:10Z

@kbr-scylla pushed a new version with a last patch that adds the tablets+MV tests (the entire file...) to the skip-on-debug list, so we don't have the timeout on the arm+debug build.

I hate this solution, but I can't fix everything in this PR - better solutions will have to come later. As I already noted, there is already a PR in the pipeline to switch this test to using just two nodes. But a 10 nodes test shouldn't have broken the world.

kbr-scylla · 2023-12-19T13:27:56Z

@nyh why not use the skip_mode marker which works per test case?

BTW. the file should be probably split into multiple files anyway so the test cases can run concurrently.

nyh · 2023-12-19T13:35:21Z

BTW. the file should be probably split into multiple files anyway so the test cases can run concurrently.

No, I think that's a mistake. There's elegance in putting related things in the same file (this isn't Java), and this is NOT a big test file, and not even the biggest test file (test_tablets.py is bigger).

nyh · 2023-12-19T13:36:22Z

@nyh why not use the skip_mode marker which works per test case?

Because I didn't realize it works (I used it for skipping release - copying Tomek's use of it, but didn't realize I could use it for debug as well).

Ok, I'll push a new version like that.

This patch adds a test (in the topology test framework) for issue scylladb#16313 - the bug where Alternator LSI must use synchronous view updates but didn't. This test fails with high probability (around 50%) before the previous patch, which fixed this bug - and passes consistently after the patch (I ran it 100 times and it didn't fail even once). This is the first test in the topology framework that uses the DynamoDB API and not CQL. This required a couple of tiny convenience functions, which are introduced in the only test file that uses them - but if we want we can later move them out to a library file. Unfortunately, the standard AWS SDK for Python - boto3 - is *not* asynchronous, so this test is also not really asynchronous, and will block the event loop while making requests to Alternator. However, for now it doesn't matter (we do NOT run multiple tests in the same event loop), and if it ever matters, I mentioned a couple of options what we can do in a comment. Because this test uses a 10-node cluster, it is skipped in debug-mode runs. In a later patch we will replace it by a more efficent - and more reliable - 2-node test. Refs scylladb#16313 Signed-off-by: Nadav Har'El <[email protected]>

nyh · 2023-12-19T13:43:11Z

Pushed a new version that uses @skip_mode instead of suite.yaml to skip the 10-node test.
As I said time and again, this 10-node test will soon disappear anyway (PR #16440 gets rid of it) so no need to continue discussing how awful a 10-node test is.

scylladb-promoter · 2023-12-19T15:49:12Z

🔴 CI State: FAILURE

✅ - Build
❌ - dtest
✅ - Unit Tests

Failed Tests (1/23482):

test_disablebinary_and_disablegossip 🔍

Build Details:

Duration: 2 hr 6 min
Builder: spider6.cloudius-systems.com

nyh · 2023-12-19T19:16:17Z

Failed Tests (1/23482):

test_disablebinary_and_disablegossip 🔍

A known flaky test: #16219
I'll restart the CI

scylladb-promoter · 2023-12-19T21:48:34Z

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 2 hr 30 min
Builder: spider3.cloudius-systems.com

nyh · 2023-12-19T22:08:04Z

Hallelujah.

@kbr-scylla please consider re-merging this PR. It now has a @skip_mode to skip the 10-node test on debug mode. And as I said, this 10-node test will soon disappear anyway (PR #16440 gets rid of it, and I can push these next PRs as soon as this one finally gets in)

nyh · 2023-12-20T08:47:17Z

@scylladb/scylla-maint merge ping.
To summarize yesterday events: This PR was already merged, but dequeued by @kbr-scylla because the 10-node test sometimes took more than the test.py timeout in the debug build. So in the new version of this PR, this test is marked for not running on debug build.

kbr-scylla · 2023-12-20T09:00:43Z

Qd

nyh requested review from eliransin, kostja, nuivall, tgrabiec and wmitros December 10, 2023 13:21

nyh mentioned this pull request Dec 10, 2023

Full support for tablets in Alternator - and make it the default #16355

Closed

avikivity reviewed Dec 10, 2023

View reviewed changes

This was referenced Dec 11, 2023

CQL LSI promises about synchronous view updates should match implementation #16371

Closed

Turn on tablets on keyspace by default when the feature is enabled #16364

Merged

wmitros reviewed Dec 12, 2023

View reviewed changes

eliransin approved these changes Dec 13, 2023

View reviewed changes

nuivall reviewed Dec 13, 2023

View reviewed changes

nuivall approved these changes Dec 13, 2023

View reviewed changes

nyh added 2 commits December 17, 2023 19:55

nyh added 3 commits December 17, 2023 20:14

nyh force-pushed the fix-16313 branch from d1ee9fe to 031420a Compare December 17, 2023 18:15

tgrabiec approved these changes Dec 18, 2023

View reviewed changes

nyh force-pushed the fix-16313 branch from 47d0e66 to 2e031f2 Compare December 19, 2023 13:41

scylladb-promoter merged commit ffb6ae9 into scylladb:master Dec 20, 2023

nyh mentioned this pull request Dec 31, 2023

Alternator with tablets got broken by a recent patch #16587

Closed

Conversation

nyh commented Dec 10, 2023

Uh oh!

scylladb-promoter commented Dec 10, 2023

🟢 CI State: SUCCESS

Build Details:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nyh commented Dec 11, 2023

Uh oh!

wmitros left a comment

Choose a reason for hiding this comment

Uh oh!

eliransin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgrabiec commented Dec 13, 2023

Uh oh!

nyh commented Dec 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nyh Dec 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuivall Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nyh Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

nyh Dec 17, 2023 •

edited

Loading

nuivall Dec 18, 2023 •

edited

Loading

nyh Dec 18, 2023 •

edited

Loading

nuivall commented Dec 13, 2023 •

edited

Loading

nuivall left a comment •

edited

Loading