plumbing: format/packfile, fix delta-base parsing bugs#2163
Merged
Conversation
76aac6e to
b8e766d
Compare
pjbgf
reviewed
May 29, 2026
The OFS-delta validation rejected only negative offsets larger than the entry's own pack offset, missing the boundary case where the encoded negative offset equals the entry's own offset. The resolved base offset is then zero, which points at the pack signature byte instead of a real object header. Canonical Git rejects this in packfile.c[1] with `base_offset <= 0 || base_offset >= delta_obj_offset`; go-git was using a strict `>` instead of `>=`. The same off-by-one appeared in the two OFS-delta varint readers under storage/filesystem/mmap (`(*ondemandObject).computeTypeAndSize` and `(*ondemandObject).resolveDelta`), so this commit sweeps both sites alongside the scanner. Add a scanner test covering an OFS-delta whose encoded negative offset equals its own pack offset, and a fuzz corpus seed of the same probe. [1]: https://github.com/git/git/blob/94f057755b/packfile.c#L1289-L1290 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
The OFS-delta base-offset predicate landed in three sites in the fix for the off-by-one: the scanner header decode and the on-demand mmap object's two delta-resolution paths. Each carried a duplicated copy of the canonical-Git citation and the same boundary condition. Consolidate them behind one exported helper, ValidateOFSDeltaBase, so a future audit reads the canonical condition once. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
Parse() resolved all REF-deltas in one loop and OFS-deltas in another, processing the two delta types as independent sets. A REF-delta whose base is an in-pack OFS-delta therefore failed to resolve: the REF pass looked up its base hash before the OFS-delta had been applied, since the OFS-delta's hash is unknown at scan time. The lookup misclassified the in-pack base as a thin-pack external reference and the chain never produced a real parent. Canonical Git's `threaded_second_pass` in builtin/index-pack.c[1] walks the delta DAG from non-delta bases, advancing the REF and OFS children of every parent together rather than in two sweeps. Port that shape: index the pending delta lists by parent offset and parent hash, then visit every reachable delta depth-first from each non-delta base. REF-deltas not reached by the walk fall through to the existing thin-pack placeholder path; an OFS-delta whose recorded negative offset matches no in-pack object header is reported as malformed input. A child reachable through more than one parent (two non-deltas with identical content, or an OFS-delta resolving to the same hash as a non-delta in the pack) is processed only on the first visit. Cover the case with a parser test that constructs a pack with the shape `[base blob, OFS-delta(base=blob), REF-delta(base= OFS-delta-hash)]` and asserts all three objects resolve, plus a fuzz corpus seed of the same probe. [1]: https://github.com/git/git/blob/94f057755b/builtin/index-pack.c#L1103 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
Parser.Parse mutates state owned by the Parser: the per-delta parent pointers it links during depth-first delta resolution, and the cache maps that index resolved objects by hash and offset. Neither is reset on entry. A second call against the same Parser would see the prior call's state — whether that call succeeded or failed mid-walk — and produce undefined results. The constraint is not new; the new `c.parent != nil` skip guard in resolveDeltas makes its consequences slightly more visible. Spell it out in the type's godoc so callers don't reach for Parse a second time and silently hit the leftover state. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
The encoder emits every delta in a single pack with the same kind: all OFS_DELTA by default, all REF_DELTA when useRefDeltas is set. The parser now resolves packs that mix the two kinds in one chain, because mixed-kind packs occur in the wild — repacks across servers with differing --delta-base-offset settings, thin-pack splices, and third-party tooling all produce them. Add a short comment at the write site so the next reader of writeDeltaHeader doesn't file the asymmetry between read- and write-side delta-kind support as a defect. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
b8e766d to
77f41a2
Compare
`deltaEncodeSize` in `diff_delta.go` and `writeTestObjectHeader` in `internal_test.go` each hand-rolled the same 7-bit-chunk LEB128 loop as the writer counterpart to `util.DecodeLEB128`. Route both through `util.EncodeLEB128` / `EncodeLEB128ToWriter` so a single overflow-bounded implementation backs every site that emits an LEB128 length, mirroring the decoder pair already exported from the same package. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]>
77f41a2 to
077d56f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.