Skip to content

Conversation

@fmeum
Copy link
Collaborator

@fmeum fmeum commented Sep 6, 2025

The error message observed in #26713 is consistent with FileChannel#open failing because the given path doesn't exist. This could happen if one Bazel process observed that !entryDir.isDirectory(), which is true if the path doesn't exist, and proceeded to delete the path while another process had just created the directory and is now opening the channel. Since the entry dir is never expected to be an existing non-directory unless the cache has been corrupted, this logic can be removed.

Another possible source of IOException during normal operation is on an interrupt (such as the user hitting Ctrl+C). Instead, follow Skyframe best practices by surfacing this as an InterruptedException instead of a FileLockInterruptionException.

Also document that concurrent use on the same path is not supported (it results in an OverlappingFileLockException if the lock is already held, regardless of whether that is in shared or exclusive mode) and why the current usages are safe.

Fixes #26713

The error message observed in bazelbuild#26713 is consistent with `FileChannel#open` failing because the given path doesn't exist. This could happen if one Bazel process observed that `!entryDir.isDirectory()`, which is true if the path doesn't exist, and proceeded to delete the directory while another was between creating the directory and opening the channel. Since the entry dir is never expected to be an existing non-directory unless the cache has been corrupted, this logic can be removed.

Also harden `FileSystemLock` against other types of exceptions by not wrapping `InterruptedException` in an `IOException` and document that concurrent use on the same path is not supported.
@fmeum fmeum requested review from Wyverald and tjgq and removed request for Wyverald September 6, 2025 09:21
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Sep 6, 2025
@Wyverald
Copy link
Member

Wyverald commented Sep 8, 2025

Also harden FileSystemLock against other types of exceptions by not wrapping InterruptedException in an IOException and document that concurrent use on the same path is not supported.

Could you elaborate a bit more on this? Why does this improve things?

@fmeum
Copy link
Collaborator Author

fmeum commented Sep 9, 2025

Could you elaborate a bit more on this? Why does this improve things?

I updated the description, Skyframe usually wants InterruptedException to be forwarded as is to be responsive to interrupts and not treat them as errors. The stack traces from that some mention the lock path though, so this can't be responsible for the original failure that was reported.

@fmeum fmeum requested a review from Wyverald September 9, 2025 21:08
@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Sep 9, 2025
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Sep 10, 2025
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Sep 10, 2025
The error message observed in bazelbuild#26713 is consistent with `FileChannel#open` failing because the given path doesn't exist. This could happen if one Bazel process observed that `!entryDir.isDirectory()`, which is true if the path doesn't exist, and proceeded to delete the path while another process had just created the directory and is now opening the channel. Since the entry dir is never expected to be an existing non-directory unless the cache has been corrupted, this logic can be removed.

Another possible source of `IOException` during normal operation is on an interrupt (such as the user hitting Ctrl+C). Instead, follow Skyframe best practices by surfacing this as an `InterruptedException` instead of a `FileLockInterruptionException`.

Also document that concurrent use on the same path is not supported (it results in an `OverlappingFileLockException` if the lock is already held, regardless of whether that is in shared or exclusive mode) and why the current usages are safe.

Fixes bazelbuild#26713

Closes bazelbuild#26914.

PiperOrigin-RevId: 805338728
Change-Id: Ie808ebe6113b935180b93c21679d5398aa168802
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Sep 10, 2025
The error message observed in bazelbuild#26713 is consistent with `FileChannel#open` failing because the given path doesn't exist. This could happen if one Bazel process observed that `!entryDir.isDirectory()`, which is true if the path doesn't exist, and proceeded to delete the path while another process had just created the directory and is now opening the channel. Since the entry dir is never expected to be an existing non-directory unless the cache has been corrupted, this logic can be removed.

Another possible source of `IOException` during normal operation is on an interrupt (such as the user hitting Ctrl+C). Instead, follow Skyframe best practices by surfacing this as an `InterruptedException` instead of a `FileLockInterruptionException`.

Also document that concurrent use on the same path is not supported (it results in an `OverlappingFileLockException` if the lock is already held, regardless of whether that is in shared or exclusive mode) and why the current usages are safe.

Fixes bazelbuild#26713

Closes bazelbuild#26914.

PiperOrigin-RevId: 805338728
Change-Id: Ie808ebe6113b935180b93c21679d5398aa168802
github-merge-queue bot pushed a commit that referenced this pull request Sep 10, 2025
The error message observed in #26713 is consistent with
`FileChannel#open` failing because the given path doesn't exist. This
could happen if one Bazel process observed that
`!entryDir.isDirectory()`, which is true if the path doesn't exist, and
proceeded to delete the path while another process had just created the
directory and is now opening the channel. Since the entry dir is never
expected to be an existing non-directory unless the cache has been
corrupted, this logic can be removed.

Another possible source of `IOException` during normal operation is on
an interrupt (such as the user hitting Ctrl+C). Instead, follow Skyframe
best practices by surfacing this as an `InterruptedException` instead of
a `FileLockInterruptionException`.

Also document that concurrent use on the same path is not supported (it
results in an `OverlappingFileLockException` if the lock is already
held, regardless of whether that is in shared or exclusive mode) and why
the current usages are safe.

Fixes #26713

Closes #26914.

PiperOrigin-RevId: 805338728
Change-Id: Ie808ebe6113b935180b93c21679d5398aa168802

Commit
ca1cbfe

Co-authored-by: Fabian Meumertzheim <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Sep 16, 2025
The error message observed in #26713 is consistent with
`FileChannel#open` failing because the given path doesn't exist. This
could happen if one Bazel process observed that
`!entryDir.isDirectory()`, which is true if the path doesn't exist, and
proceeded to delete the path while another process had just created the
directory and is now opening the channel. Since the entry dir is never
expected to be an existing non-directory unless the cache has been
corrupted, this logic can be removed.

Another possible source of `IOException` during normal operation is on
an interrupt (such as the user hitting Ctrl+C). Instead, follow Skyframe
best practices by surfacing this as an `InterruptedException` instead of
a `FileLockInterruptionException`.

Also document that concurrent use on the same path is not supported (it
results in an `OverlappingFileLockException` if the lock is already
held, regardless of whether that is in shared or exclusive mode) and why
the current usages are safe.

Fixes #26713

Closes #26914.

PiperOrigin-RevId: 805338728
Change-Id: Ie808ebe6113b935180b93c21679d5398aa168802

Commit
ca1cbfe

Co-authored-by: Fabian Meumertzheim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repository cache failure when running concurrent builds

2 participants