Skip to content

Align object encoding with upstream#2066

Merged
pjbgf merged 14 commits into
mainfrom
commit-v6
May 6, 2026
Merged

Align object encoding with upstream#2066
pjbgf merged 14 commits into
mainfrom
commit-v6

Conversation

@pjbgf
Copy link
Copy Markdown
Member

@pjbgf pjbgf commented May 6, 2026

  • Ensures that unmodified Commit and Tag objects return their raw bytes (minus gpgsign and gpgsign-sha256 headers).
  • Aligns Commit and Tag interpretation with upstream: first instance of standard headers wins.
  • Drop MergeTag in favour of ExtraHeaders.
  • Add conformance tests to ensure verification aligns with git. Note that duplicate signatures were valid back on v2.11, which is why that test is skipped during capability mode.
  • Aligns Tree interpretation with upstream: first wins and unsorted tree is short-circuited.

pjbgf added 14 commits May 6, 2026 15:33
Replace the per-iteration flag soup in (*Commit).Decode with a
func-based stateFn loop that mirrors upstream Git's parsing posture.
The new state machine encodes "where we are in the buffer" as the
function being called rather than as a bag of mutable booleans.

Conformance changes vs upstream commit.c:
- tree must be the first header, missing tree errors with
  ErrMalformedCommit (matches the "bogus commit object" check at
  commit.c:508-510).
- Parents are accepted only contiguously after tree; out-of-position
  parent lines are silently dropped (matches parse_commit_buffer's
  parent loop exiting at the first non-parent line and
  read_commit_extra_header_lines filtering parents from extras).
- Author and committer are accepted only at their canonical positions
  (immediately after parents, immediately after author). Out-of-place
  occurrences are dropped, mirroring parse_commit_date silently
  returning 0 plus the standard_header_field filter.
- Encoding/mergetag/gpgsig/gpgsig-sha256 follow first-wins; duplicate
  sig/mergetag continuation lines are discarded by a dedicated
  scanSkipCont state.
- Continuation strip uses line[1:] (single space) instead of
  bytes.TrimLeft, mirroring upstream's `line + 1` at commit.c:1509.
- Header/body boundary is the literal "\n" only, matching
  *line == '\n' at commit.c:1502.

Tests:
- TestDecodeRequiresTreeFirst covers the four hard-error inputs
  (missing tree, parent before tree, extra before tree, empty buffer).
- TestDecodeFirstOccurrenceWins covers the lenient drop semantics for
  duplicate tree/author/committer/gpgsig/gpgsig-sha256, parents after
  author or interleaved with extras, missing committer, encoding
  between author and committer (drops the misplaced committer), and
  author at a non-canonical position (drops author and committer).
- TestMalformedHeader is back to expecting NoError (lenient parsing of
  garbage author/committer values).

The state machine lives in plumbing/object/commit_scanner.go.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
The previous MergeTag field had two issues:
1. It could only represent a single mergetag, whilst upstream support multiple within a single Commit.
2. Did not allow for a byte-to-byte alignment with upstream, due to the above, and its position
set by go-git during encoding time.

In order to mitigate both issues, it will now be handled as part of ExtraHeaders, which
better aligns with upstream.

Signed-off-by: Paulo Gomes <[email protected]>
Tag verification re-encoded from struct fields, so any non-canonical
bytes the signature was computed over (duplicate headers, mutation-only
fields like Signature/SignatureSHA256, etc.) were lost from the payload
and signatures that were valid in upstream Git failed in go-git.

Mirror the approach already used by Commit: retain a reference to the
source plumbing.EncodedObject on Decode, add matchesSource() so
EncodeWithoutSignature can stream the raw bytes (with the inline
trailing PGP block truncated and gpgsig/gpgsig-sha256 headers stripped)
when the struct still matches the source. Mutating struct fields still
falls back to the struct-encoded payload.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Align Commit.Verify with upstream's parse_buffer_signed_by_header
(commit.c:1186) and parse_gpg_output rejection (gpg-interface.c:257-269).

The commit scanner now concatenates every gpgsig/gpgsig-sha256 header
into a single signature buffer instead of first-wins, and Commit.Verify
returns the new ErrMultipleSignatures when that buffer carries more
than one armored block.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Copilot AI review requested due to automatic review settings May 6, 2026 14:47
@pjbgf
Copy link
Copy Markdown
Member Author

pjbgf commented May 6, 2026

Git Compatibility failure is orthogonal:

--- FAIL: TestGitServer_Timeout (0.26s)
    git_test.go:187: 
        	Error Trace:	/home/runner/work/go-git/go-git/internal/server/git/git_test.go:187
        	Error:      	"199.310309ms" is not greater than or equal to "200ms"
        	Test:       	TestGitServer_Timeout
        	Messages:   	server should not close before Timeout elapses
FAIL

Will be fixed by #2013.

@pjbgf pjbgf merged commit c4e48f8 into main May 6, 2026
26 of 29 checks passed
@pjbgf pjbgf deleted the commit-v6 branch May 6, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant