-
Notifications
You must be signed in to change notification settings - Fork 869
*: Initial support for opening sha256 repositories using ObjectID
#1527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plumbing/hash package feels redundant and can be replaced with Go's crypto and hash packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason why plumbing.hash needs to exist currently is to support hash.RegisterHash within the explicit context of go-git. If we were to remove that, and lean on upstream's instead, then we would need to rethink on the UX.
I'm not sure we should force users to replace their sha1 implementation across their app, if they only want to use it in the context of go-git.
IIRC, upstream had a recent performance optimisations whereby they use both hashes: sha1cd when interacting with remote repositories, and sha1 for all local operations.
This feels like an orthogonal topic which could be dealt with as a separate issue/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, don't we already do that when sha1cd gets imported? It replaces upstream's sha1 in the registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right. That will need to change upstream depending on the approach we take.
I'll make a few tests around this, which may result in either the removal of plumbing/hash or the landing of the hybrid approach (if the performance gains justifies it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of the upstream behaviour will be handled separately, the same applies to the changes in go-git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can remove plumbing/hash and combine the implementations of sha1 and sha256 hasher and object id. Also, instead of using interfaces for ObjectHasher, we can use a concrete type *ObjectHasher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened pjbgf#27 against your branch 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of the upstream behaviour will be handled separately, the same applies to the changes in
go-git.
I think that's fine and is expected. Ideally, go-git shouldn't directly import crypto hashers to avoid overriding the user's registered hashers. Users should be able to use custom hashers using the crypto register and interface, while go-git could provide a default import for sha1cd and crypto/sha256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added your commits into the PR, PTAL.
There are some additional changes required around hashing and potentially the removal of our plumbing/hash, but I'd rather do that in a follow-up PR.
plumbing/hash_string.go
Outdated
|
|
||
| // TODO: Compare and CompareBytes | ||
| // Compare compares the hash's sum with a slice of bytes. | ||
| func (s StringHash) Compare(b []byte) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this expect the hash bytes or the hash hex string?
plumbing/hash/hash.go
Outdated
| const ( | ||
| SHA1Size = 20 | ||
| SHA1HexSize = SHA1Size * 2 | ||
| SHA256Size = 32 | ||
| SHA256HexSize = SHA256Size * 2 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are redefined here. We could use format.SHA1Size etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of the sha256 conditional build, which is no longer needed. All this references have now been removed.
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
The support for SHA256 will require the mapping between ObjectFormat and the correct hash.Hash to be used. Signed-off-by: Paulo Gomes <[email protected]>
The introduction of both new types are a stepping stone to enable SHA256 support concurrently - without the need for a build tag. ImmutableHash provides a way to represent varied sized hashes (for SHA1 or SHA256) with a single type, while keeping similar performance of the existing plumbing.Hash. The names were picked in order to provide a clearer distinction on what they do. ImmutableHash is the result of a hash operation and can't be changed once calculated. ObjectHasher, adds the Git object header before a normal hash operation that hash.Hash or plumbing/hash.Hash do. The performance results shows that SHA1 operations could be slower, however for SHA256 it can be over 3 times faster: ~Hasher/hasher-sha1-100B-16 2202226 574.7 ns/op 174.00 MB/s 272 B/op 7 allocs/op ~Hasher/objecthash-sha1-100B-16 1511851 772.6 ns/op 129.43 MB/s 272 B/op 9 allocs/op ~Hasher/objecthash-sha256-100B-16 5057584 247.4 ns/op 404.21 MB/s 96 B/op 7 allocs/op ~Hasher/hasher-sha1-5000B-16 96476 12523 ns/op 399.25 MB/s 5536 B/op 7 allocs/op ~Hasher/objecthash-sha1-5000B-16 105376 10983 ns/op 455.27 MB/s 272 B/op 9 allocs/op ~Hasher/objecthash-sha256-5000B-16 420741 2828 ns/op 1767.77 MB/s 96 B/op 7 allocs/op ~HashFromHex/hasher-parse-sha1-valid-16 23243548 64.65 ns/op 618.69 MB/s 48 B/op 1 allocs/op ~HashFromHex/objecthash-fromhex-sha1-valid-16 18120699 79.62 ns/op 502.36 MB/s 72 B/op 2 allocs/op ~HashFromHex/objecthash-fromhex-sha256-valid-16 8969871 124.2 ns/op 515.22 MB/s 96 B/op 2 allocs/op Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
The binary.ReadHash is now replaced with ReadFrom on the ObjectID struct. Signed-off-by: Paulo Gomes <[email protected]>
With the addition of multi-hash support, there is no longer need to have a sha256 specific test that relies on build tags. Signed-off-by: Paulo Gomes <[email protected]>
The new ObjectID replaces plumbing.Hash and supports both SHA1 and SHA256. The backing array is sized after SHA256, resulting in a larger memory footprint on a per hash basis. Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
The previous example of Blame, started from a clone, instead of opening an existing repository. Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
ObjectID differs from the previous Hash as it has a new format field. For the purposes for equality checking, both values 'sha1' and '' should be perceived as the same, so to avoid cases where the equality check is not being done internally, the value 'sha1' is no longer used. The field format should only be populated when its value is 'sha256'. Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
This commit changes the ObjectFormat type from a string to an int. This would make a default ObjectFormat declaration default to SHA1. A string function is defined to return the string representation of the ObjectFormat.
This combines the ObjectHasher implementation to be hash function agnostic. The SHA1 and SHA256 implementations are now combined into a single implementation that uses the given format to determine the hash function to use.
Signed-off-by: Paulo Gomes <[email protected]>
Users can opt-out of sha1cd by adding an empty import for the upstream implementation: _ "crypto/sha1" Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
| // NewEncoder returns a new stream encoder that writes to w. | ||
| func NewEncoder(w io.Writer) *Encoder { | ||
| h := hash.New(hash.CryptoType) | ||
| h := hash.New(crypto.SHA1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put a todo here to add support for format object.
| h := hash.New(crypto.SHA1) | |
| // TODO: Support passing an ObjectFormat instead of hardcoding SHA1 | |
| h := hash.New(crypto.SHA1) |
| if _, err = e.Write(commitFileSignature); err == nil { | ||
| version := byte(1) | ||
| if hash.CryptoType == crypto.SHA256 { | ||
| if crypto.Hash(e.hash.Size()) == crypto.Hash(crypto.SHA256.Size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if crypto.Hash(e.hash.Size()) == crypto.Hash(crypto.SHA256.Size()) { | |
| if e.hash.Size() == crypto.SHA256.Size() { |
| // NewEncoder returns a new stream encoder that writes to w. | ||
| func NewEncoder(w io.Writer) *Encoder { | ||
| h := hash.New(hash.CryptoType) | ||
| h := hash.New(crypto.SHA1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| h := hash.New(crypto.SHA1) | |
| // TODO: Support passing ObjectFormat instead of hardcoding SHA1 | |
| h := hash.New(crypto.SHA1) |
| // NewDecoder returns a new decoder that reads from r. | ||
| func NewDecoder(r io.Reader) *Decoder { | ||
| h := hash.New(hash.CryptoType) | ||
| h := hash.New(crypto.SHA1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| h := hash.New(crypto.SHA1) | |
| // TODO: Support passing ObjectFormat instead of hardcoding SHA1 | |
| h := hash.New(crypto.SHA1) |
| // NewEncoder returns a new encoder that writes to w. | ||
| func NewEncoder(w io.Writer) *Encoder { | ||
| h := hash.New(hash.CryptoType) | ||
| h := hash.New(crypto.SHA1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| h := hash.New(crypto.SHA1) | |
| // TODO: Support passing ObjectFormat instead of hardcoding SHA1 | |
| h := hash.New(crypto.SHA1) |
| func NewEncoder(w io.Writer, s storer.EncodedObjectStorer, useRefDeltas bool) *Encoder { | ||
| h := plumbing.Hasher{ | ||
| Hash: hash.New(hash.CryptoType), | ||
| Hash: hash.New(crypto.SHA1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Hash: hash.New(crypto.SHA1), | |
| // TODO: Support SHA256 | |
| Hash: hash.New(crypto.SHA1), |
| dict := make([]byte, 16*1024) | ||
| crc := crc32.NewIEEE() | ||
| packhash := gogithash.New(gogithash.CryptoType) | ||
| packhash := gogithash.New(crypto.SHA1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| packhash := gogithash.New(crypto.SHA1) | |
| // TODO: Support SHA256 | |
| packhash := gogithash.New(crypto.SHA1) |
Signed-off-by: Paulo Gomes <[email protected]>
Introduces a new
plumbing.ObjectIDwhich supports either SHA1 and SHA256 object formats, removing the need of using build tags for the sha256 support.Follow-up PRs will expand support for other git features. Please note that the
sha256is currently broken, as the creation ofsha256is yet to be implemented.Relates to #899 #706.