fix layer size computation: handle hard links correctly#2304
fix layer size computation: handle hard links correctly#2304crosbymichael merged 2 commits intomoby:masterfrom
Conversation
There was a problem hiding this comment.
You can use your size variable here instead.
|
@crosbymichael I've made the changes you suggested. Thank you! |
|
LGTM ping @vieux |
|
Souldn't we do the same for container ? |
|
@vieux You're right, I will update container.go : GetSize() as well. |
|
While we are here, shouldn't we report the real disk usage, instead of the file size? Instead of And maybe add a test to use the visible size in case the type assertion fails. |
|
@jpetazzo I don't know if that's an issue, if it's computed for every pull and if it's what we want. @vieux Is the size of the layer computed after each pull ? If not, should it be? I'm trying to figure out what the best option would be. |
|
@unclejack which kind of FS compression are you using? (I'd love to reproduce here, if I can!) |
|
@unclejack it is, it's called in |
|
@unclejack what's the status of this ? |
|
@vieux We have to figure out what we want to do in order to handle sparse files. We could let this code compute the size by looking at the sizes of the files and add another feature which computes the used disk space. Reporting the amount of used disk space and the total size of the files was something which came up yesterday during the Docker meeting. The issue is that we can't tell apart a sparse file from a compressed file. We can detect compression, but that doesn't help because we still end up with compressed files and compressed sparse files which we can't tell apart. Detecting compression would only help us figure out that we can't say that used space is almost equal to the total space those files would take as an uncompressed tarball. The downside of this PR is that we won't be taking sparse files into account and we'll add up their maximum possible size. Since the existing code isn't taking sparse files into account either, changing it to take hard links into account is reasonable. For now, we could just use this improvement to detect hard links and account for them. I'll add the same code for container sizes and then we can merge this. @vieux I'll ping you when it's ready for merging. |
|
@unclejack I don't know if you saw my comment; but I was wondering with what kind of compression you were using (so I can do some tests on my own)? |
|
@jpetazzo I've seen your comment. It's just that I've done some research and it doesn't look like there's any simple way to detect sparse files when compression is involved. Let's discuss on IRC about this. |
|
@unclejack I couldn't get a hold on you on IRC :) can you just tell me which kind of compression you're using? I just want to experiment a bit here. Thanks! |
|
ping @unclejack |
|
@jpetazzo did you get the chance to work on this ? |
|
@vieux Can you try this out, please? I've updated the code to be much faster and to make the same changes to GetSize() . However, some tests are failing randomly. This doesn't handle sparse files because they can't also be handled on a FS with compression. |
There was a problem hiding this comment.
Why use uint8 here I see you used it like a bool? to have something small ?
bool and uint8 have the same size: http://play.golang.org/p/-sSaAiO49V
Please use a bool, I think it's mare easy to read.
This change makes docker compute layer size correctly. The old code isn't taking hard links into account. Layers could seem like they're up to 1-1.5x larger than they really were.
This change makes docker compute container size correctly. The old code isn't taking hard links into account. Containers could seem like they're up to 1-1.5x larger than they really were.
|
@vieux Thank you for the suggestion, I've updated the code. TestAttachDetach is still failing, though. |
|
@unclejack it happens sometimes, see #2771 LGTM ping @crosbymichael |
|
LGTM |
Fix layer size computation: handle hard links correctly
This change makes docker compute layer sizes correctly.
The old code isn't taking hard links into account. Layers could appear as being 1-1.5-2x larger than they really were when a lot of hard links are involved.
With this change, the layer size is being computed correctly and reported exactly the same as
du. The reason why it has to be computed correctly and take hard links into account is because that's the real amount of space that data is going to take.Tar also handles hard links and the data isn't showing up twice in the tarball. That means there's no reason for Docker to compute the amount of space these layers are taking up on disk in any other way, other than the same way as
du.