Skip to content

Fix LWT prepared statement routing failure in PRESERVE_REPLICA_ORDER mode#831

Merged
dkropachev merged 1 commit intoscylla-4.xfrom
fix-lwt-prepared-statement-routing-830
Mar 6, 2026
Merged

Fix LWT prepared statement routing failure in PRESERVE_REPLICA_ORDER mode#831
dkropachev merged 1 commit intoscylla-4.xfrom
fix-lwt-prepared-statement-routing-830

Conversation

@dkropachev
Copy link
Copy Markdown

Summary

Fixes issue #830 where LWT prepared statements fail with "No node was available" error when using PRESERVE_REPLICA_ORDER routing method (default in 4.19.0.5).

The root cause was that the new LWT routing logic returned empty query plans when replica information wasn't available (common for prepared statements before parameter binding).

Changes

Core Fix

  • Enhanced newQueryPlanPreserveReplicas to implement proper node prioritization:

    1. Local DC replicas (order preserved)
    2. Remote DC replicas (order preserved)
    3. Local DC non-replicas (rotated by routing key or randomly)
    4. Remote DC non-replicas (rotated by routing key or randomly)
  • Added proper handling for both localDc == null and localDc != null cases

  • Implemented routing-key-based consistent rotation for deterministic behavior

  • Replaced diceRoll1d4() with universal randomNextInt(bound) for better testability

Test Updates

  • Updated existing LWT routing tests to match new behavior expectations
  • Added controlled randomness testing using Mockito spy to mock randomNextInt()
  • Added tests for both consistent (with routing key) and random (without routing key) rotation
  • Updated all test references from diceRoll1d4() to randomNextInt(4)

Behavior Changes

Before: LWT queries with PRESERVE_REPLICA_ORDER could fail with empty query plans
After: LWT queries always include all available nodes with proper DC prioritization

Backward Compatibility

  • Non-breaking: Only adds nodes that weren't previously included
  • When replica information is available, behavior is enhanced but compatible
  • Maintains all existing performance optimizations
  • Configuration remains the same

Test Plan

Resolves #830

@mykaul
Copy link
Copy Markdown

mykaul commented Mar 6, 2026

@copilot - please review.

Copy link
Copy Markdown

Copilot AI commented Mar 6, 2026

@mykaul I've opened a new pull request, #832, to work on those changes. Once the pull request is ready, I'll request review from you.

@dkropachev dkropachev marked this pull request as ready for review March 6, 2026 10:23
@dkropachev dkropachev force-pushed the fix-lwt-prepared-statement-routing-830 branch from 2c340aa to 2188c15 Compare March 6, 2026 10:24
@dkropachev dkropachev force-pushed the fix-lwt-prepared-statement-routing-830 branch from 2188c15 to 1f98936 Compare March 6, 2026 10:42
@dkropachev dkropachev requested a review from Copilot March 6, 2026 10:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes LWT routing in PRESERVE_REPLICA_ORDER mode to avoid empty query plans (and resulting “No node was available” errors), especially in prepared-statement scenarios where replica info may be missing.

Changes:

  • Moves LWT routing-method selection into BasicLoadBalancingPolicy and routes newQueryPlan() accordingly.
  • Implements a new preserve-replica-order query plan that always includes all live nodes, with DC prioritization and deterministic/non-deterministic rotation for non-replicas.
  • Refactors and expands unit tests (including controllable randomness via randomNextInt(bound)).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/BasicLoadBalancingPolicy.java Adds LWT routing method parsing/dispatch and new preserve-replica-order query-plan construction with rotation.
core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicy.java Removes duplicated LWT routing logic and replaces diceRoll1d4() with randomNextInt(4).
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/LwtRoutingTestBase.java New shared test suite validating preserve-replica-order behavior, fallback behavior, and rotation.
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyLwtRoutingTest.java Simplifies policy-specific test to reuse the new shared LWT routing test base.
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/BasicLoadBalancingPolicyLwtRoutingTest.java Adds Basic policy-specific coverage (notably localDc == null).
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyRequestRoutingTest.java Updates routing-method API usage to getRequestRoutingMethod().
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyQueryPlanTest.java Updates tests to expect randomNextInt(4) instead of diceRoll1d4().
core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DcInferringLoadBalancingPolicyQueryPlanTest.java Updates test override hook from diceRoll1d4() to randomNextInt(bound).

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +435 to +443
private void addRotatedNonReplicas(
List<Node> queryPlan, List<Node> nodes, List<Node> replicas, Request request) {
List<Node> nonReplicas =
nodes.stream().filter(n -> !replicas.contains(n)).collect(Collectors.toList());
if (!nonReplicas.isEmpty()) {
rotateNonReplicas(nonReplicas, request);
queryPlan.addAll(nonReplicas);
}
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replicas.contains(n) inside a stream makes non-replica selection O(N*M) and can become expensive with larger node/replica sets. Convert replicas to a Set<Node> once (e.g., in newQueryPlanPreserveReplicas) and use that for membership checks when building nonReplicas.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replicas list is bounded by the replication factor, which is typically 3–5. With such small lists, List.contains() is effectively O(1) in practice and likely faster than the overhead of creating a HashSet (hashing, boxing, allocation). Not worth optimizing.

Comment on lines +378 to +394
private Map<String, List<Node>> getAllNodesByDc() {
Map<String, List<Node>> nodesByDc = new LinkedHashMap<>();
Set<String> allDcs = getLiveNodes().dcs();
// Add preferred remote DCs first (in configured order)
for (String dc : preferredRemoteDcs) {
if (allDcs.contains(dc)) {
nodesByDc.put(dc, dcNodeList(dc));
}
}
// Add remaining DCs
for (String dc : allDcs) {
if (!nodesByDc.containsKey(dc)) {
nodesByDc.put(dc, dcNodeList(dc));
}
}
return nodesByDc;
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of allDcs iteration is undefined for a generic Set, so the “remaining DCs” portion of the query-plan order can become non-deterministic across JVM runs (and harder to reason about/debug). Consider using a deterministic order for non-preferred DCs (e.g., sort the remaining DC names, or use an order guaranteed by the underlying dcs() implementation if such a guarantee exists and is documented).

Copilot uses AI. Check for mistakes.
Comment on lines +381 to +386
// Add preferred remote DCs first (in configured order)
for (String dc : preferredRemoteDcs) {
if (allDcs.contains(dc)) {
nodesByDc.put(dc, dcNodeList(dc));
}
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new preserve-replica-order logic adds explicit ordering for preferredRemoteDcs, but the updated LWT routing tests don’t appear to assert that preferred remote DCs are actually prioritized ahead of other remote DCs in the resulting plan. Add a unit test that configures multiple remote DCs plus LOAD_BALANCING_DC_FAILOVER_PREFERRED_REMOTE_DCS, then asserts the remote non-replica portion of the plan respects that configured order.

Copilot uses AI. Check for mistakes.
…mode

Addresses issue #830 where LWT prepared statements fail with "No node was available"
error when using PRESERVE_REPLICA_ORDER routing (default in 4.19.0.5).

Changes:
- Enhanced newQueryPlanPreserveReplicas to include non-replica nodes with proper DC prioritization
- Added fallback logic to prevent empty query plans for prepared statements
- Implemented routing-key-based consistent rotation for non-replica nodes
- Replaced diceRoll1d4() with universal randomNextInt(bound) method
- Updated tests to validate new behavior including randomness testing

The fix maintains replica order preservation while ensuring query plans always
contain nodes, resolving the regression introduced in 4.19.0.5.
@dkropachev dkropachev force-pushed the fix-lwt-prepared-statement-routing-830 branch from 1f98936 to 8122539 Compare March 6, 2026 11:47
@dkropachev dkropachev merged commit 50b0695 into scylla-4.x Mar 6, 2026
24 checks passed
@dkropachev dkropachev deleted the fix-lwt-prepared-statement-routing-830 branch March 6, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4.x: Shardawareness for LWT transactions fails with prepared statements. Works with previous patch 4.19.0.4

5 participants