fix(auth): recover from orphaned navigator locks via steal fallback#2106
Conversation
📝 WalkthroughWalkthroughRewrites the navigatorLock flow to introduce timeout-based lock acquisition with automatic recovery via lock stealing. When lock acquisition times out, the system attempts to force-acquire the lock with the Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Code
participant NL as navigator.locks
participant AC as AbortController
participant Recovery as Recovery Logic
participant Fn as Provided Function
Client->>NL: request(lock, {signal})
AC->>AC: timeout fires
AC-->>NL: abort signal
NL-->>Client: AbortError
Client->>Recovery: acquireTimeout > 0?
alt Timeout occurred and acquireTimeout > 0
Recovery->>NL: request(lock, {steal: true})
NL-->>Recovery: lock acquired
Recovery->>Fn: execute()
Fn-->>Recovery: result
Recovery->>NL: release lock
else Timeout on immediate acquire (acquireTimeout === 0)
Recovery-->>Client: throw NavigatorLockAcquireTimeoutError
else Non-AbortError exception
Recovery-->>Client: propagate error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
|
Once I'd diagnosed & planned a fix, used Claude Code to help validate & polish. Happy to take any feedback & tweak/resubmit. Passes formatting, tests, build. Thanks to @coppinger for troubleshooting support as well. |
|
+1 that this is causing major headaches |
|
Thank you for this PR @ElliotPadfield , it has sparked an interesting (and educative for me) conversation within the team. As I mentioned here, before considering using the |
@supabase/auth-js
@supabase/functions-js
@supabase/postgrest-js
@supabase/realtime-js
@supabase/storage-js
@supabase/supabase-js
commit: |
Will throw something together & coordinate with @coppinger. Thanks for your attention to the issue. |
|
After lots of internal discussions, we have agreed to approve this PR, until we come up with a solution to the core of the problem, which may bring wider architectural changes. To approve this PR, can you please resolve the conflicts with After this PR is merged, we also agreed to decrease the timeout to 5s instead of 10s, but this is out of scope for this PR, since the change was introduced elsewhere. |
|
@mandarini Any way you could expedite the process by resolving said conflicts? Been tracking this issue for over a week now, would appreciate it greatly to have it resolved. |
When a Navigator Lock is held indefinitely (e.g., due to React Strict
Mode's double-mount/unmount leaving an orphaned lock callback), all
subsequent auth operations hang forever because:
1. The acquireTimeout fires and aborts the pending lock request
2. The AbortError propagates, but the orphaned held lock is unaffected
3. All future callers timeout and fail with the same AbortError
This adds a recovery mechanism: when lock acquisition times out with
AbortError (indicating a likely orphaned lock), retry with
{ steal: true } to forcefully acquire the lock per the Web Locks API
spec. The previous holder's callback continues to completion but no
longer blocks other callers.
Closes supabase/supabase#42505
Co-Authored-By: Claude Opus 4.6 <[email protected]>
03f258d to
f931f90
Compare
The SDK now handles orphaned lock recovery via steal internally (supabase-js#2106). Keep the BroadcastChannel observability wrapper for Sentry signals. The steal-based orphaned lock recovery in `debuggableNavigatorLock` (packages/common/gotrue.ts) (introduced in #39868) is now redundant, supabase-js#2106 handles this natively in the SDK. Removes the `navigator.locks.request({ steal: true })` block while keeping the BroadcastChannel wrapper that sends lock-holder stack traces to Sentry. Related: supabase/supabase-js#2106, supabase/supabase-js#2125
Summary
Fixes #2111
When a Navigator Lock is held indefinitely (e.g., due to React Strict Mode's double-mount/unmount leaving an orphaned lock callback in
GoTrueClient._acquireLock'spendingInLockdrain loop), all subsequent auth operations (getUser(),signInWithPassword(), etc.) hang forever.Root cause: The existing
acquireTimeoutmechanism usesAbortController.abort()to cancel pending lock requests. Per the Web Locks API spec, aborting the signal only removes pending (waiting-to-acquire) requests from the queue — it has no effect on an already-held lock. So when a lock is orphaned:AbortError, and give upFix: When lock acquisition times out with
AbortError, instead of propagating the error, retry with{ steal: true }. Per the spec, this releases any currently held lock with the same name and grants the request immediately. The previous holder's callback continues running to completion but no longer blocks other callers.Changes
packages/core/auth-js/src/lib/locks.ts— Wrapnavigator.locks.request()in try/catch; onAbortErrorwith positiveacquireTimeout, retry with{ steal: true }to recover from orphaned lockspackages/core/auth-js/test/lib/locks.test.ts— Add 3 tests covering: orphaned lock recovery via steal, non-AbortError passthrough, no steal attempt with negative timeoutTest plan
jest --testPathPattern='test/lib/locks')getUser()in auseEffect, trigger the orphaned lock scenario (visible vianavigator.locks.query()), confirm auth recovers instead of hanging🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests