Description
In MetadataManager.SingleThreaded.startSchemaRequest(), when schemaQueriesFactory.newInstance() throws a synchronous exception (e.g., control channel unavailable or control node not found in metadata), the currentSchemaRefresh field is never reset to null, and the queued schema refresh is never drained.
This causes all future schema refreshes to be permanently blocked for the lifetime of the session.
Root cause
The cleanup code that resets currentSchemaRefresh, completes firstSchemaRefreshFuture, and drains queuedSchemaRefresh only runs inside the .whenComplete() callback of the thenApplyAsync chain. When newInstance() throws synchronously before that chain is even constructed, the catch (Throwable t) block only completes the refreshFuture exceptionally but skips all cleanup.
The same issue exists when agreementError != null — currentSchemaRefresh is never cleared there either.
Impact
currentSchemaRefresh remains non-null forever — all subsequent schema refresh requests are queued and never executed
firstSchemaRefreshFuture may never complete
- The queued refresh (
queuedSchemaRefresh) is stuck forever
- This is a permanent deadlock of the schema refresh mechanism for the session
How to trigger
DefaultSchemaQueriesFactory.newInstance() can throw IllegalStateException when:
- The control channel is
null or already closed
- The control node's endpoint is not found in metadata
These are transient conditions (e.g., a node going down during schema refresh) that should be recoverable, but due to this bug they permanently break schema refresh.
Fix
Add cleanup logic (reset currentSchemaRefresh, complete firstSchemaRefreshFuture, drain queuedSchemaRefresh) to both:
- The
catch (Throwable t) block handling newInstance() failures
- The
agreementError != null branch
Description
In
MetadataManager.SingleThreaded.startSchemaRequest(), whenschemaQueriesFactory.newInstance()throws a synchronous exception (e.g., control channel unavailable or control node not found in metadata), thecurrentSchemaRefreshfield is never reset tonull, and the queued schema refresh is never drained.This causes all future schema refreshes to be permanently blocked for the lifetime of the session.
Root cause
The cleanup code that resets
currentSchemaRefresh, completesfirstSchemaRefreshFuture, and drainsqueuedSchemaRefreshonly runs inside the.whenComplete()callback of thethenApplyAsyncchain. WhennewInstance()throws synchronously before that chain is even constructed, thecatch (Throwable t)block only completes therefreshFutureexceptionally but skips all cleanup.The same issue exists when
agreementError != null—currentSchemaRefreshis never cleared there either.Impact
currentSchemaRefreshremains non-null forever — all subsequent schema refresh requests are queued and never executedfirstSchemaRefreshFuturemay never completequeuedSchemaRefresh) is stuck foreverHow to trigger
DefaultSchemaQueriesFactory.newInstance()can throwIllegalStateExceptionwhen:nullor already closedThese are transient conditions (e.g., a node going down during schema refresh) that should be recoverable, but due to this bug they permanently break schema refresh.
Fix
Add cleanup logic (reset
currentSchemaRefresh, completefirstSchemaRefreshFuture, drainqueuedSchemaRefresh) to both:catch (Throwable t)block handlingnewInstance()failuresagreementError != nullbranch