Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a bare clone of crates.io-index #4015

Closed
cuviper opened this issue May 9, 2017 · 5 comments · Fixed by #4026
Closed

Use a bare clone of crates.io-index #4015

cuviper opened this issue May 9, 2017 · 5 comments · Fixed by #4026

Comments

@cuviper
Copy link
Member

cuviper commented May 9, 2017

Would it be possible to use a bare git clone for the index? The checked-out files currently take about 75MB on disk, but it's all redundant with .git/objects/. If the files can be read in-memory via whatever libgit2 has equivalent to git cat-file, then they shouldn't need to be checked out at all.

@alexcrichton
Copy link
Member

This'd be an awesome idea! Might even solve that nul-on-Windows problem!

I... don't know why we didn't do this originally.

I believe that git2-rs has all the bindings necessary for reading tree objects and iteration and whatnot. The index.rs module for the registry will need to be abstracted a little bit to have trait methods for all filesystem accesses, but otherwise given a Repository you'd just resolve the head into a Commit, turn that into a Tree which supports lookup through get_path to a TreeEntry where I believe the to_object method returns an object which can be cast to a Blob that supports extracting the bytes of the relevant index file.

I think we could even do this without tampering with all existing checkouts, we could just start updating the database. That is, the logic for updating the registry would look like:

  • Attempt to open the folder as a Repository, if that fails do a bare clone. If that succeeds it may or may not be a bare repo, but shouldn't matter.
  • Next whenever we're updating the registry we just fetch commits from the remote and resolve the remote's HEAD reference
  • Finally when doing lookups we'd just use the path I outlined above.

I'd definitely be willing to help out and assist anyone who'd like to implement this!

@cuviper
Copy link
Member Author

cuviper commented May 9, 2017

Might even solve that nul-on-Windows problem!

Maybe if we'd used bare repos from the start, but I hope we won't abandon older Cargo versions that won't learn to deal with this. I lean towards keeping crates.io compatible ~forever. :) (Although in this case, Windows users almost always get binaries upstream, rather than having distributions like Linux.)

Anyway, I'm glad there's interest! I'm willing to work on this myself, but I have other things on my plate right now. If someone else wants to give it a shot, please say so, or else I'll leave a comment before I start.

@alexcrichton
Copy link
Member

Oh yeah I don't think we'll want to lift the naming restrictions anyway for a number of other reasons, it may have just made the nul problem less catastrophic!

If you've got any questions about Cargo and/or git2, just lemme know and I'd be glad to help!

@alexcrichton
Copy link
Member

I saw a slow clone today and got inspired to implement this, so I'll be sending a PR shortly!

@alexcrichton
Copy link
Member

Ok I've sent a PR: #4026

alexcrichton added a commit to alexcrichton/cargo that referenced this issue May 11, 2017
This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes rust-lang#4015
bors added a commit that referenced this issue May 11, 2017
Don't check out the crates.io index locally

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants