Skip to content

Conversation

@pjbgf
Copy link
Member

@pjbgf pjbgf commented Apr 23, 2025

Introduces a new plumbing.ObjectID which supports either SHA1 and SHA256 object formats, removing the need of using build tags for the sha256 support.

Follow-up PRs will expand support for other git features. Please note that the sha256 is currently broken, as the creation of sha256 is yet to be implemented.

Relates to #899 #706.

@pjbgf pjbgf added this to the v6.0.0 milestone Apr 23, 2025
Copy link
Member

@aymanbagabas aymanbagabas Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plumbing/hash package feels redundant and can be replaced with Go's crypto and hash packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason why plumbing.hash needs to exist currently is to support hash.RegisterHash within the explicit context of go-git. If we were to remove that, and lean on upstream's instead, then we would need to rethink on the UX.

I'm not sure we should force users to replace their sha1 implementation across their app, if they only want to use it in the context of go-git.
IIRC, upstream had a recent performance optimisations whereby they use both hashes: sha1cd when interacting with remote repositories, and sha1 for all local operations.

This feels like an orthogonal topic which could be dealt with as a separate issue/PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't we already do that when sha1cd gets imported? It replaces upstream's sha1 in the registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. That will need to change upstream depending on the approach we take.

I'll make a few tests around this, which may result in either the removal of plumbing/hash or the landing of the hybrid approach (if the performance gains justifies it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of the upstream behaviour will be handled separately, the same applies to the changes in go-git.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can remove plumbing/hash and combine the implementations of sha1 and sha256 hasher and object id. Also, instead of using interfaces for ObjectHasher, we can use a concrete type *ObjectHasher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened pjbgf#27 against your branch 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of the upstream behaviour will be handled separately, the same applies to the changes in go-git.

I think that's fine and is expected. Ideally, go-git shouldn't directly import crypto hashers to avoid overriding the user's registered hashers. Users should be able to use custom hashers using the crypto register and interface, while go-git could provide a default import for sha1cd and crypto/sha256.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your commits into the PR, PTAL.

There are some additional changes required around hashing and potentially the removal of our plumbing/hash, but I'd rather do that in a follow-up PR.

@pjbgf pjbgf marked this pull request as ready for review April 26, 2025 12:59
@pjbgf pjbgf requested a review from aymanbagabas April 27, 2025 07:06

// TODO: Compare and CompareBytes
// Compare compares the hash's sum with a slice of bytes.
func (s StringHash) Compare(b []byte) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this expect the hash bytes or the hash hex string?

Comment on lines 15 to 20
const (
SHA1Size = 20
SHA1HexSize = SHA1Size * 2
SHA256Size = 32
SHA256HexSize = SHA256Size * 2
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are redefined here. We could use format.SHA1Size etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of the sha256 conditional build, which is no longer needed. All this references have now been removed.

pjbgf and others added 19 commits June 3, 2025 23:56
The support for SHA256 will require the mapping between ObjectFormat and the
correct hash.Hash to be used.

Signed-off-by: Paulo Gomes <[email protected]>
The introduction of both new types are a stepping stone to enable
SHA256 support concurrently - without the need for a build tag.

ImmutableHash provides a way to represent varied sized hashes (for SHA1
or SHA256) with a single type, while keeping similar performance of the
existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what
they do. ImmutableHash is the result of a hash operation and can't be
changed once calculated. ObjectHasher, adds the Git object header before
a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower,
however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Signed-off-by: Paulo Gomes <[email protected]>
The binary.ReadHash is now replaced with ReadFrom on the
ObjectID struct.

Signed-off-by: Paulo Gomes <[email protected]>
With the addition of multi-hash support, there is no longer
need to have a sha256 specific test that relies on build tags.

Signed-off-by: Paulo Gomes <[email protected]>
The new ObjectID replaces plumbing.Hash and supports both
SHA1 and SHA256. The backing array is sized after SHA256,
resulting in a larger memory footprint on a per hash basis.

Signed-off-by: Paulo Gomes <[email protected]>
The previous example of Blame, started from a clone, instead of
opening an existing repository.

Signed-off-by: Paulo Gomes <[email protected]>
pjbgf and others added 14 commits June 3, 2025 23:56
Signed-off-by: Paulo Gomes <[email protected]>
ObjectID differs from the previous Hash as it has a new
format field. For the purposes for equality checking, both
values 'sha1' and '' should be perceived as the same, so
to avoid cases where the equality check is not being done
internally, the value 'sha1' is no longer used. The field
format should only be populated when its value is 'sha256'.

Signed-off-by: Paulo Gomes <[email protected]>
This commit changes the ObjectFormat type from a string to an int. This
would make a default ObjectFormat declaration default to SHA1. A string
function is defined to return the string representation of the ObjectFormat.
This combines the ObjectHasher implementation to be hash function
agnostic. The SHA1 and SHA256 implementations are now combined into a
single implementation that uses the given format to determine the
hash function to use.
Users can opt-out of sha1cd by adding an empty import
for the upstream implementation:
_ "crypto/sha1"

Signed-off-by: Paulo Gomes <[email protected]>
// NewEncoder returns a new stream encoder that writes to w.
func NewEncoder(w io.Writer) *Encoder {
h := hash.New(hash.CryptoType)
h := hash.New(crypto.SHA1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put a todo here to add support for format object.

Suggested change
h := hash.New(crypto.SHA1)
// TODO: Support passing an ObjectFormat instead of hardcoding SHA1
h := hash.New(crypto.SHA1)

if _, err = e.Write(commitFileSignature); err == nil {
version := byte(1)
if hash.CryptoType == crypto.SHA256 {
if crypto.Hash(e.hash.Size()) == crypto.Hash(crypto.SHA256.Size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if crypto.Hash(e.hash.Size()) == crypto.Hash(crypto.SHA256.Size()) {
if e.hash.Size() == crypto.SHA256.Size() {

// NewEncoder returns a new stream encoder that writes to w.
func NewEncoder(w io.Writer) *Encoder {
h := hash.New(hash.CryptoType)
h := hash.New(crypto.SHA1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
h := hash.New(crypto.SHA1)
// TODO: Support passing ObjectFormat instead of hardcoding SHA1
h := hash.New(crypto.SHA1)

// NewDecoder returns a new decoder that reads from r.
func NewDecoder(r io.Reader) *Decoder {
h := hash.New(hash.CryptoType)
h := hash.New(crypto.SHA1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
h := hash.New(crypto.SHA1)
// TODO: Support passing ObjectFormat instead of hardcoding SHA1
h := hash.New(crypto.SHA1)

// NewEncoder returns a new encoder that writes to w.
func NewEncoder(w io.Writer) *Encoder {
h := hash.New(hash.CryptoType)
h := hash.New(crypto.SHA1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
h := hash.New(crypto.SHA1)
// TODO: Support passing ObjectFormat instead of hardcoding SHA1
h := hash.New(crypto.SHA1)

func NewEncoder(w io.Writer, s storer.EncodedObjectStorer, useRefDeltas bool) *Encoder {
h := plumbing.Hasher{
Hash: hash.New(hash.CryptoType),
Hash: hash.New(crypto.SHA1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Hash: hash.New(crypto.SHA1),
// TODO: Support SHA256
Hash: hash.New(crypto.SHA1),

dict := make([]byte, 16*1024)
crc := crc32.NewIEEE()
packhash := gogithash.New(gogithash.CryptoType)
packhash := gogithash.New(crypto.SHA1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
packhash := gogithash.New(crypto.SHA1)
// TODO: Support SHA256
packhash := gogithash.New(crypto.SHA1)

@pjbgf pjbgf merged commit 856aa92 into go-git:v6-transport Jun 4, 2025
14 checks passed
@pjbgf pjbgf deleted the sha256-open branch June 4, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants