Skip to content

Removed leaveDotGit from fetchgit, since it's broken by design.#4752

Closed
madjar wants to merge 1 commit intoNixOS:masterfrom
madjar:master
Closed

Removed leaveDotGit from fetchgit, since it's broken by design.#4752
madjar wants to merge 1 commit intoNixOS:masterfrom
madjar:master

Conversation

@madjar
Copy link
Member

@madjar madjar commented Oct 31, 2014

fetchgit with leaveDotGit can never be deterministic, since the
resulting .git can change whenever the upstream repository
changes.

Since it is impossible for git to download a single commit, fetchgit
downloads the whole repository and searches for the specific
commit. This means that the resulting .git contains the whole
repository, including any commit that was made after the one we
want. This also mean that adding any commit to the repository will add
it to prefetch's .git, thus changing its hash, and making it impossible
to make it deterministic.

fetchgit with leaveDotGit can never be deterministic, since the
resulting .git can change whenever the upstream repository
changes.

Since it is impossible for git to download a single commit, fetchgit
downloads the whole repository and searches for the specific
commit. This means that the resulting .git contains the whole
repository, including any commit that was made after the one we
want. This also mean that adding any commit to the repository will add
it to prefetch's .git, thus changing its hash, and making it impossible
to make it deterministic.
@bjornfor
Copy link
Contributor

(Thanks for bringing this up. I was just playing with leaveDotGit.)

What if we discard all refs not reachable from the given rev and run git gc etc. to re-pack the repo? Wouldn't that make it deterministic?

@edolstra
Copy link
Member

+1 on removing this.

@bjornfor Even if we exclude all non-reachable refs, we would still depend on the exact file format used in .git, which I imagine could change from time to time.

@domenkozar
Copy link
Member

+1, maybe add a comment why we remove .git and why it can't never be deterministic.

@bjornfor
Copy link
Contributor

@edolstra: I understand that. However, I think (hope) that such changes happen relatively seldom, and the benefit of having working "git describe" in the build would outweigh that issue.

Anyway, I'm not against removing leaveDotGit now, as it is broken, but I hope you (and the community) are not against re-adding it if the only remaining determinism issue would be git changing its data format. (In that case, determinism could also be ensured by fixing the git version used in fetchgit. If you really want it...)

@madjar
Copy link
Member Author

madjar commented Oct 31, 2014

For future references, the sources of non-deterministicness I've gathered are the following:

  • .git/log contains dates
  • .git/index contains dates
  • .git/hooks contain the nix store path of git
  • .git/objects contains any object that exist in the remote repository
  • .git/FETCH_HEAD and .git/refs/remotes/origin/master contain the sha of the last commit

That being said, it's completely possible to tweak all this to make it deterministic. Still, I'm kind of tired of that problem and I decided to work around it for nox. I'll still have to resolve it, or find another workaround, for cargo, so I might come back to it a some point.

In the meantime, for simple cases like "git describe", it would probably be simpler to allow the execution of some hook before the .git is removed to run git describe and store the result in the output path ($NIX_PREFETCH_GIT_CHECKOUT_HOOK seems to be just that, but I haven't tired it).

@bjornfor
Copy link
Contributor

bjornfor commented Nov 1, 2014

I've made a minimal deterministic git "clone" mechanism. At least according to my test setup. The procedure is simply to remove all branches and tags not reachable from the given commit id and then run "git gc". This is deterministic even when the upstream repo gets new commits/branches/tags. I'd also like to test whether the "packedness" state of the upstream repo affects the clone. There is also "git repack"...

Trying to integrate this into fetchgit I noticed that instead of resetting HEAD back to 'rev', like I did in my test, fetchgit creates a new branch called "fetchgit". I have to read more, but it occurred to me that a build system may want to run "git branch" to see from which branch this is being built. Creating a 'fetchgit' may be a bad idea. Need to dig into the fetchgit/nix-prefetch-git implementation.

@bjornfor
Copy link
Contributor

bjornfor commented Nov 1, 2014

I've made a pull: #4767 ("nix-prefetch-git: fix determinism with leaveDotGit").

@bjornfor
Copy link
Contributor

bjornfor commented Nov 1, 2014

Oh, and about the "fetchgit" branch that nix-prefetch-git creates, I realize that it can be fixed by using $NIX_PREFETCH_GIT_CHECKOUT_HOOK.

@madjar
Copy link
Member Author

madjar commented Nov 2, 2014

Closing, since @bjornfor made it work.

@madjar madjar closed this Nov 2, 2014
@nbp
Copy link
Member

nbp commented Nov 9, 2014

Wasn't this feature needed for Hydra?

@edolstra
Copy link
Member

No, Hydra does its own fetching.

@edolstra
Copy link
Member

BTW, I'm still in favour of removing this misfeature rather than having complicated hacks like #4767 (which IMHO cannot solve the problem).

@bjornfor
Copy link
Contributor

@edolstra: I need leaveDotGit to build a certain project at work. Without a reasonable .git/ directory the build fails really hard... (And I don't have control over the build system.)

I understand that leaveDotGit is "fragile". But as long as it's deterministic for a certain git derivation, can't we let it live?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants