Skip to content

[2.x] FileNotFoundException from corrupted/evicted remote CAS entry fails build instead of falling back to recompute #8889

@idanbenzvi

Description

@idanbenzvi

sbt version: 2.0.0-RC9
Scala version: 3.x

Problem

When using the built-in gRPC remote cache (Global / remoteCache := Some(uri("grpc://...")))
with bazel-remote as the CAS server, builds intermittently fail with:

[error] java.io.FileNotFoundException: .../target/out/value/sha256-/.json (No such file or directory)
22:58:06  �[0m[�[0m�[31merror�[0m] �[0m�[0mjava.io.FileNotFoundException: /home/jenkins/agent/workspace/I_cap-commons_feature_sbt2_final/target/out/value/sha256-16b04cae19175fe44daff3c3be06dca3fa7af838a6f85add49fce6ae2afcb80d/48.json (No such file or directory)�[0m

failure_log_during_publish.txt

This happens when the remote cache server reports an AC (Action Cache) hit
but the corresponding CAS blob is missing or was evicted (possibly corrupted?).
Rather than treating this as a cache miss and recomputing the task, SBT propagates the
exception and fails the build.

The failure is intermittent — it disappears on retry without any code changes,
confirming it is a transient infrastructure issue rather than a build logic error.

Steps to reproduce

Background:
Java Azul JVM 17.0.18
SBT version 2.13.16
Running on Jenkins CI environment

The project is a large monorepo consisting of several projects some of which are libraries while other are microservices with deep interdependency between each other.

  1. Configure a remote cache backed by bazel-remote (running on a pod - the specific settings can be provided if needed)
  2. Evict or corrupt a CAS entry while leaving the AC entry intact
  3. Run a build — java.io.FileNotFoundException is thrown from ActionCache

Expected behavior

SBT should treat a missing/unreadable CAS blob as a cache miss and recompute
the task locally, just as it does for other cache miss scenarios.

Notes

I've identified three unguarded syncBlobs call sites in ActionCache.scala
and prepared a fix with regression tests. Will open a PR alongside this issue.

Issue flow

flowchart TD
    A([SBT task evaluation]) --> B{Check local\nsymlink\nfast-path}

    B -- symlink exists --> C[readFromSymlink\nRead value JSON]
    B -- no symlink --> D[findActionResult\nQuery Action Cache]

    C --> E{AC hit?}
    E -- yes --> F[syncBlobs\nfast-path]
    E -- no --> D

    D --> G{AC hit?}
    G -- no --> K([organicTask\nCompute locally])
    G -- yes --> H{Value inline\nor via blob?}

    H -- inline --> I[syncBlobs\noutput files only]
    H -- blob --> J[syncBlobs\nread value from path]

    subgraph BEFORE ["❌ Before fix — unguarded"]
        F  -- FileNotFoundException --> ERR1([💥 Build FAILS])
        I  -- FileNotFoundException --> ERR2([💥 Build FAILS])
        J  -- FileNotFoundException --> ERR3([💥 Build FAILS])
    end

    subgraph AFTER ["✅ After fix — NonFatal catch"]
        F  -- NonFatal catch --> MISS1[Returns None\ncache miss]
        I  -- NonFatal catch --> MISS2[Returns Left-None\ncache miss]
        J  -- NonFatal catch --> MISS2
        MISS1 --> D
        MISS2 --> K
    end

    K --> L[Run action\ncalled++]
    L --> M[store.put\nWrite AC entry]
    M --> N[syncBlobs\nWrite output files]

    subgraph ORGANIC ["organicTask — also guarded"]
        N -- NonFatal catch --> LOG[Debug log\nSkipping cache storage]
        LOG --> OK
    end

    N -- success --> OK([✅ Return result])

    style ERR1 fill:#ff4d4d,color:#fff
    style ERR2 fill:#ff4d4d,color:#fff
    style ERR3 fill:#ff4d4d,color:#fff
    style BEFORE fill:#fff0f0,stroke:#ff4d4d
    style AFTER fill:#f0fff0,stroke:#2ecc71
    style ORGANIC fill:#f0f8ff,stroke:#3498db
    style OK fill:#2ecc71,color:#fff
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions