Fix LWT prepared statement routing failure in PRESERVE_REPLICA_ORDER mode#831
Conversation
|
@copilot - please review. |
2c340aa to
2188c15
Compare
2188c15 to
1f98936
Compare
There was a problem hiding this comment.
Pull request overview
Fixes LWT routing in PRESERVE_REPLICA_ORDER mode to avoid empty query plans (and resulting “No node was available” errors), especially in prepared-statement scenarios where replica info may be missing.
Changes:
- Moves LWT routing-method selection into
BasicLoadBalancingPolicyand routesnewQueryPlan()accordingly. - Implements a new preserve-replica-order query plan that always includes all live nodes, with DC prioritization and deterministic/non-deterministic rotation for non-replicas.
- Refactors and expands unit tests (including controllable randomness via
randomNextInt(bound)).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/BasicLoadBalancingPolicy.java | Adds LWT routing method parsing/dispatch and new preserve-replica-order query-plan construction with rotation. |
| core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicy.java | Removes duplicated LWT routing logic and replaces diceRoll1d4() with randomNextInt(4). |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/LwtRoutingTestBase.java | New shared test suite validating preserve-replica-order behavior, fallback behavior, and rotation. |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyLwtRoutingTest.java | Simplifies policy-specific test to reuse the new shared LWT routing test base. |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/BasicLoadBalancingPolicyLwtRoutingTest.java | Adds Basic policy-specific coverage (notably localDc == null). |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyRequestRoutingTest.java | Updates routing-method API usage to getRequestRoutingMethod(). |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicyQueryPlanTest.java | Updates tests to expect randomNextInt(4) instead of diceRoll1d4(). |
| core/src/test/java/com/datastax/oss/driver/internal/core/loadbalancing/DcInferringLoadBalancingPolicyQueryPlanTest.java | Updates test override hook from diceRoll1d4() to randomNextInt(bound). |
You can also share your feedback on Copilot code review. Take the survey.
| private void addRotatedNonReplicas( | ||
| List<Node> queryPlan, List<Node> nodes, List<Node> replicas, Request request) { | ||
| List<Node> nonReplicas = | ||
| nodes.stream().filter(n -> !replicas.contains(n)).collect(Collectors.toList()); | ||
| if (!nonReplicas.isEmpty()) { | ||
| rotateNonReplicas(nonReplicas, request); | ||
| queryPlan.addAll(nonReplicas); | ||
| } | ||
| } |
There was a problem hiding this comment.
replicas.contains(n) inside a stream makes non-replica selection O(N*M) and can become expensive with larger node/replica sets. Convert replicas to a Set<Node> once (e.g., in newQueryPlanPreserveReplicas) and use that for membership checks when building nonReplicas.
There was a problem hiding this comment.
The replicas list is bounded by the replication factor, which is typically 3–5. With such small lists, List.contains() is effectively O(1) in practice and likely faster than the overhead of creating a HashSet (hashing, boxing, allocation). Not worth optimizing.
| private Map<String, List<Node>> getAllNodesByDc() { | ||
| Map<String, List<Node>> nodesByDc = new LinkedHashMap<>(); | ||
| Set<String> allDcs = getLiveNodes().dcs(); | ||
| // Add preferred remote DCs first (in configured order) | ||
| for (String dc : preferredRemoteDcs) { | ||
| if (allDcs.contains(dc)) { | ||
| nodesByDc.put(dc, dcNodeList(dc)); | ||
| } | ||
| } | ||
| // Add remaining DCs | ||
| for (String dc : allDcs) { | ||
| if (!nodesByDc.containsKey(dc)) { | ||
| nodesByDc.put(dc, dcNodeList(dc)); | ||
| } | ||
| } | ||
| return nodesByDc; | ||
| } |
There was a problem hiding this comment.
The ordering of allDcs iteration is undefined for a generic Set, so the “remaining DCs” portion of the query-plan order can become non-deterministic across JVM runs (and harder to reason about/debug). Consider using a deterministic order for non-preferred DCs (e.g., sort the remaining DC names, or use an order guaranteed by the underlying dcs() implementation if such a guarantee exists and is documented).
| // Add preferred remote DCs first (in configured order) | ||
| for (String dc : preferredRemoteDcs) { | ||
| if (allDcs.contains(dc)) { | ||
| nodesByDc.put(dc, dcNodeList(dc)); | ||
| } | ||
| } |
There was a problem hiding this comment.
The new preserve-replica-order logic adds explicit ordering for preferredRemoteDcs, but the updated LWT routing tests don’t appear to assert that preferred remote DCs are actually prioritized ahead of other remote DCs in the resulting plan. Add a unit test that configures multiple remote DCs plus LOAD_BALANCING_DC_FAILOVER_PREFERRED_REMOTE_DCS, then asserts the remote non-replica portion of the plan respects that configured order.
…mode Addresses issue #830 where LWT prepared statements fail with "No node was available" error when using PRESERVE_REPLICA_ORDER routing (default in 4.19.0.5). Changes: - Enhanced newQueryPlanPreserveReplicas to include non-replica nodes with proper DC prioritization - Added fallback logic to prevent empty query plans for prepared statements - Implemented routing-key-based consistent rotation for non-replica nodes - Replaced diceRoll1d4() with universal randomNextInt(bound) method - Updated tests to validate new behavior including randomness testing The fix maintains replica order preservation while ensuring query plans always contain nodes, resolving the regression introduced in 4.19.0.5.
1f98936 to
8122539
Compare
Summary
Fixes issue #830 where LWT prepared statements fail with "No node was available" error when using PRESERVE_REPLICA_ORDER routing method (default in 4.19.0.5).
The root cause was that the new LWT routing logic returned empty query plans when replica information wasn't available (common for prepared statements before parameter binding).
Changes
Core Fix
Enhanced
newQueryPlanPreserveReplicasto implement proper node prioritization:Added proper handling for both
localDc == nullandlocalDc != nullcasesImplemented routing-key-based consistent rotation for deterministic behavior
Replaced
diceRoll1d4()with universalrandomNextInt(bound)for better testabilityTest Updates
randomNextInt()diceRoll1d4()torandomNextInt(4)Behavior Changes
Before: LWT queries with
PRESERVE_REPLICA_ORDERcould fail with empty query plansAfter: LWT queries always include all available nodes with proper DC prioritization
Backward Compatibility
Test Plan
Resolves #830