Skip to content

fix(images): handle tar extraction edge cases#199

Merged
DorianZheng merged 1 commit intoboxlite-ai:mainfrom
uran0sH:fix-tar
Feb 6, 2026
Merged

fix(images): handle tar extraction edge cases#199
DorianZheng merged 1 commit intoboxlite-ai:mainfrom
uran0sH:fix-tar

Conversation

@uran0sH
Copy link
Copy Markdown
Contributor

@uran0sH uran0sH commented Feb 3, 2026

Fixes multiple OCI layer tar extraction issues:

Path handling:

  • Use entry.path() instead of entry.header().path() to support PAX extensions
  • Properly handle filenames >100 bytes (tar header limit)
  • PAX/GNU longname extensions now correctly applied to restore full paths
  • Fixes file truncation during extraction (e.g., discord-api-types package)

Hardlinks:

  • Deferred hardlink mechanism for targets appearing later in tar
  • Fixes pnpm compatibility where hardlinks precede target files

Directory handling:

  • File-to-directory replacements when creating parent dirs
  • Missing directories removed by whiteout processing

Refactoring:

  • Introduces EntryMetadata with ownership/timestamps composition
  • Reduces apply_ownership parameters from 9 to 3
  • Adds builder pattern to avoid clippy warnings
  • Improves memory efficiency (DirMeta only stores needed fields)

Test coverage:

  • Comprehensive tests for deferred hardlinks
  • Parent directory replacement scenarios
  • pnpm structure validation

fix: #187

@uran0sH
Copy link
Copy Markdown
Contributor Author

uran0sH commented Feb 3, 2026

Plz review @DorianZheng @shayne-snap


match parent.symlink_metadata() {
Ok(m) => {
if !m.is_dir() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if parent is a symlink to another exist directory?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Directory exists, no need to check further up
break;
}
// Remove non-directory (file, symlink, etc.) blocking directory creation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's a symlink to another dir within the root path

Copy link
Copy Markdown
Contributor Author

@uran0sH uran0sH Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://specs.opencontainers.org/image-spec/layer/#changeset-over-existing-files The documentation is not clear about it. So there is another discussion about this: opencontainers/image-spec#857. However, there doesn't seem to be a final conclusion yet.
symlink to a dir is common in pnpm's structure (https://pnpm.io/symlinked-node-modules-structure). So I think we need to support it.

@uran0sH uran0sH force-pushed the fix-tar branch 4 times, most recently from 1036905 to 3aba1a1 Compare February 5, 2026 03:08
@uran0sH uran0sH requested a review from DorianZheng February 5, 2026 03:49
@uran0sH uran0sH force-pushed the fix-tar branch 2 times, most recently from 1bd239a to 5d035d8 Compare February 5, 2026 08:54
Fixes multiple OCI layer tar extraction issues:

Path handling:
- Use entry.path() instead of entry.header().path() to support PAX extensions
- Properly handle filenames >100 bytes (tar header limit)
- PAX/GNU longname extensions now correctly applied to restore full paths
- Fixes file truncation during extraction (e.g., discord-api-types package)

Hardlinks:
- Deferred hardlink mechanism for targets appearing later in tar
- Fixes pnpm compatibility where hardlinks precede target files

Directory handling:
- File-to-directory replacements when creating parent dirs
- Missing directories removed by whiteout processing

Refactoring:
- Introduces EntryMetadata with ownership/timestamps composition
- Reduces apply_ownership parameters from 9 to 3
- Adds builder pattern to avoid clippy warnings
- Improves memory efficiency (DirMeta only stores needed fields)

Test coverage:
- Comprehensive tests for deferred hardlinks
- Parent directory replacement scenarios
- pnpm structure validation

Signed-off-by: Wenyu Huang <[email protected]>
@uran0sH
Copy link
Copy Markdown
Contributor Author

uran0sH commented Feb 6, 2026

Currently, running pnpm --list and node node /app/openclaw.mjs --version in Boxlite both work. @shayne-snap Would you like to provide more test cases?

@shayne-snap
Copy link
Copy Markdown
Contributor

LGTM

Copy link
Copy Markdown
Member

@DorianZheng DorianZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! LGTM

@DorianZheng DorianZheng merged commit edfb9ca into boxlite-ai:main Feb 6, 2026
13 checks passed
@uran0sH uran0sH deleted the fix-tar branch February 9, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Layer extraction fails on images with symlink parent paths (e.g. pnpm-style layers)

3 participants