Skip to content

plumbing: object, tree entry validation#2105

Merged
pjbgf merged 3 commits into
go-git:mainfrom
hiddeco:tree-entry-validation
May 12, 2026
Merged

plumbing: object, tree entry validation#2105
pjbgf merged 3 commits into
go-git:mainfrom
hiddeco:tree-entry-validation

Conversation

@hiddeco

@hiddeco hiddeco commented May 11, 2026

Copy link
Copy Markdown
Member

Closes the encode/decode boundary that #2081 and #2097 left tolerant. Tree.Encode previously accepted entry names like .git, .., control characters, or HFS+/NTFS .git disguises, and never checked for duplicates; Tree.Decode did not validate names at all.

Tree.Encode now runs each name through pathutil.ValidTreePath and rejects duplicate names — including the file/dir collision where foo and foo/ share a bare name. Callers that need to emit non-conformant bytes write them via plumbing.EncodedObject directly.

Tree.Decode stays permissive — upstream Git makes the same split between parse_tree_buffer and fsck_tree1, and the permissive parse keeps inspection and recovery use cases working. The new Tree.Validate is the opt-in read-side counterpart, mirroring fsck_tree's structural rules: null hashes, slashes, empty/./../.git names, bad file modes, duplicates, mis-sort, oversized names, and the four *_SYMLINK rules for .gitmodules/.gitattributes/.gitignore/.mailmap. Violations come back as errors.Join of rule errors wrapping ErrInvalidTree.

Two fsck_tree warnings need the raw octal mode string — zero-padded and non-canonical 0100664 modes — which canonicalTreeMode drops at parse. Those are wire-only and best handled by a future buffer-level validator.

@hiddeco hiddeco changed the title plumbing/object: tree entry validation plumbing: object, tree entry validation May 11, 2026
@hiddeco hiddeco force-pushed the tree-entry-validation branch 2 times, most recently from a04f693 to 39421d8 Compare May 11, 2026 18:17
hiddeco added 3 commits May 11, 2026 20:25
Tree.Encode previously rejected only embedded NUL bytes and
mis-sorted entries, so a caller assembling a *Tree programmatically
could produce a tree object containing components like ".git", "..",
control characters, or HFS+/NTFS variants of ".git". Run each entry
name through pathutil.ValidTreePath at the encoder boundary so the
library cannot emit such bytes through its API.

The same loop also rejects duplicate names, including the file and
directory collision where `foo` and `foo/` resolve to the same sort
key. With entries already in memory a single pass over `entry.Name`
catches the non-consecutive case that upstream Git's `verify_ordered`
in `fsck.c`[1] tracks with a name stack; the simpler form suffices
because the file/dir variant shares the bare name.

A test helper in worktree_commit_test.go that planted a tree
containing ".git/config" for cherry-pick coverage can no longer
do that through Tree.Encode and now writes the raw object bytes
directly, matching the escape hatch documented in Tree.Encode's
godoc.

[1]: https://github.com/git/git/blob/v2.54.0/fsck.c#L544-L614

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
go-git's tree decoder is permissive on purpose: tools that inspect
or recover repositories need to read tree objects whose entries do
not pass the index-side path-validation gate. Upstream Git keeps
the same split — `parse_tree_buffer`[1] is a no-op on names while
`fsck_tree`[2] is the validator — and exposes the latter through
`git fsck`. Add the read-side counterpart to Tree.Encode's
producer gate: a Tree.Validate method that mirrors `fsck_tree`'s
structural ruleset. Decode stays as-is; callers that want
fsck-shaped reporting call Validate.

The ported rules cover null entry hashes, slashes in names, empty
names, ".", "..", ".git" with HFS+/NTFS disguises (already handled
by pathutil.ValidTreePath), bad file modes, duplicate entries,
mis-sort, names longer than 4096 bytes, and the four
FSCK_MSG_*_SYMLINK rules for ".gitmodules", ".gitattributes",
".gitignore", and ".mailmap". Two fsck_tree warnings —
zero-padded modes and the non-canonical 0100664 bits — are not
surfaced: both require retaining the raw octal mode string from
the wire, which canonicalTreeMode drops. The structural rules are
the load-bearing ones for refusing malformed trees.

The new pathutil helpers IsHFSDotGitattributes /
IsHFSDotGitignore / IsHFSDotMailmap and their NTFS counterparts
extend the existing IsHFSDotGitmodules / IsNTFSDotGitmodules pair
with the short-name prefixes upstream uses for the same names
(`gi7d29`, `gi250a`, `maba30`).

Violations are returned as an errors.Join of individual rule
errors, each wrapping ErrInvalidTree, so callers can match either
the umbrella or specific rules (ErrDuplicateEntry,
ErrEntriesNotSorted, pathutil.ErrInvalidPath) with errors.Is.

[1]: https://github.com/git/git/blob/v2.54.0/tree.c#L175-L184
[2]: https://github.com/git/git/blob/v2.54.0/fsck.c#L616-L800

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Encode previously enforced a subset of Tree.Validate's rules inline
(ValidTreePath, duplicate detection, sort order). With Validate
available as the structural ruleset, Encode can run it once at the
top and skip the duplicated checks — both shrinking the encoder and
widening the producer-side gate to refuse the rules Validate already
catches but Encode did not: null entry hashes, oversize names, and
symlinks disguised as `.gitmodules` / `.gitattributes` /
`.gitignore` / `.mailmap`.

The widened gate exposes one pre-existing path that built trees
containing entries with the zero hash: buildTreeHelper turned every
index entry into a tree entry, including those carried through go-git#1773
to avoid a panic during recursive tree construction. Skip such
entries at the top of commitIndexEntry so commit produces a
well-formed tree without regressing the panic-safe behaviour the
go-git#1773 fix introduced.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
@hiddeco hiddeco force-pushed the tree-entry-validation branch from 39421d8 to 5dc6b02 Compare May 11, 2026 18:25

@pjbgf pjbgf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiddeco thanks for working on this. 🙇

@pjbgf pjbgf merged commit f5d2b8e into go-git:main May 12, 2026
17 checks passed
@hiddeco hiddeco deleted the tree-entry-validation branch May 12, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants