Skip to content

Bug: Excessive disk usage in local_provider_runs due to redundant .cache directories #1604

@aliasaria

Description

@aliasaria

Summary
Each execution within local_provider_runs is currently creating an isolated .cache directory. Because these directories contain massive Hugging Face models and uv package caches, disk space is being exhausted rapidly when multiple runs are initiated.

Root Cause
The issue stems from the environment remapping $HOME. When $HOME is redirected for a run, tools like uv and Hugging Face defaults to creating a new cache structure within that new home instead of utilizing the host user's existing cache. While uv often uses symlinks, the current isolation prevents it from linking to a shared global store, leading to redundant data downloads and storage.

Impact
Rapid Disk Exhaustion: Each run duplicates several gigabytes of data.

Performance Hit: Increased latency for runs as they must re-download or re-index assets that should already be cached locally.

Proposed Solution
Modify the provider configuration to map the user's global .cache and uv directories back into the run environment.

  • Action: Explicitly mount/link ~/.cache and the uv cache directory to the remapped environment.

  • Reference: This follows the pattern previously implemented for the SkyPilot Kubernetes cluster to ensure cache persistence across isolated tasks.

Steps to Reproduce

  • Trigger multiple runs using the local_provider.
  • Inspect the filesystem within the local_provider_runs directory.
  • Observe that each run ID has a unique, high-volume .cache folder.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions