Skip to content

Prefetch miss with GPU containers due to missing pre-container file access tracking #2128

@wswsmao

Description

@wswsmao

Problem Description

When running containers with the --gpus all option, optimized images created by the optimize command show poor prefetch performance with significant prefetch misses.

Root Cause Analysis

The issue stems from a timing gap in file access tracking:

  1. Pre-container OCI Hook Execution: NVIDIA runtime injects dynamic libraries and executes ldconfig via OCI hooks before the container actually starts
  2. Early File Access: The ldconfig operation accesses various dynamic libraries in the image during this pre-container phase
  3. Missing Tracking: The current optimize process only starts fanotify monitoring after the container is running, missing the file access patterns from steps 1-2
  4. Runtime Prefetch Misses: When the container runs, the same OCI hook behavior repeats, causing FUSE to trigger on-demand loading for files that should have been prefetched

Proposed Solution

Implement a pre-container monitor that:

  • Starts tracking file access before container creation
  • Use FAN_MARK_FILESYSTEM to run fanotify on the host. (Testing has shown that FAN_MARK_MOUNT is unable to track a container's rootfs).
  • Captures access patterns from OCI hooks and pre-container operations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions