-
Notifications
You must be signed in to change notification settings - Fork 147
Closed
Description
Problem Description
When running containers with the --gpus all option, optimized images created by the optimize command show poor prefetch performance with significant prefetch misses.
Root Cause Analysis
The issue stems from a timing gap in file access tracking:
- Pre-container OCI Hook Execution: NVIDIA runtime injects dynamic libraries and executes
ldconfigvia OCI hooks before the container actually starts - Early File Access: The
ldconfigoperation accesses various dynamic libraries in the image during this pre-container phase - Missing Tracking: The current
optimizeprocess only starts fanotify monitoring after the container is running, missing the file access patterns from steps 1-2 - Runtime Prefetch Misses: When the container runs, the same OCI hook behavior repeats, causing FUSE to trigger on-demand loading for files that should have been prefetched
Proposed Solution
Implement a pre-container monitor that:
- Starts tracking file access before container creation
- Use FAN_MARK_FILESYSTEM to run fanotify on the host. (Testing has shown that FAN_MARK_MOUNT is unable to track a container's rootfs).
- Captures access patterns from OCI hooks and pre-container operations
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels