Skip to content

fix layer size computation: handle hard links correctly#2304

Merged
crosbymichael merged 2 commits intomoby:masterfrom
unclejack:fix_layer_size_computation
Nov 22, 2013
Merged

fix layer size computation: handle hard links correctly#2304
crosbymichael merged 2 commits intomoby:masterfrom
unclejack:fix_layer_size_computation

Conversation

@unclejack
Copy link
Copy Markdown
Contributor

This change makes docker compute layer sizes correctly.

The old code isn't taking hard links into account. Layers could appear as being 1-1.5-2x larger than they really were when a lot of hard links are involved.

With this change, the layer size is being computed correctly and reported exactly the same as du. The reason why it has to be computed correctly and take hard links into account is because that's the real amount of space that data is going to take.

Tar also handles hard links and the data isn't showing up twice in the tarball. That means there's no reason for Docker to compute the amount of space these layers are taking up on disk in any other way, other than the same way as du.

Comment thread image.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use your size variable here instead.

@unclejack
Copy link
Copy Markdown
Contributor Author

@crosbymichael I've made the changes you suggested. Thank you!

@crosbymichael
Copy link
Copy Markdown
Contributor

LGTM

ping @vieux

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Oct 22, 2013

Souldn't we do the same for container ?

@unclejack
Copy link
Copy Markdown
Contributor Author

@vieux You're right, I will update container.go : GetSize() as well.

@jpetazzo
Copy link
Copy Markdown
Contributor

While we are here, shouldn't we report the real disk usage, instead of the file size?
This would account properly for sparse files.

Instead of size := fileInfo.Size(), we would use size := 512 * fileInfo.Sys().(*syscall.Stat_t).Blocks

And maybe add a test to use the visible size in case the type assertion fails.

@unclejack
Copy link
Copy Markdown
Contributor Author

@jpetazzo
size := 512 * fileInfo.Sys().(*syscall.Stat_t).Blocks is breaking layer size computation when FS compression is used.

I don't know if that's an issue, if it's computed for every pull and if it's what we want.

@vieux Is the size of the layer computed after each pull ? If not, should it be?

I'm trying to figure out what the best option would be.

@jpetazzo
Copy link
Copy Markdown
Contributor

@unclejack which kind of FS compression are you using? (I'd love to reproduce here, if I can!)
And, can you define "breaks"? Is it inaccurate, or does it report disk usage instead of real size, or...?
Thanks a lot for the feedback anyway. I'm really happy that we have advanced users testing Docker in that kind of setup!

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Oct 24, 2013

@unclejack it is, it's called in Register after pull, import, etc...

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Oct 25, 2013

@unclejack what's the status of this ?

@unclejack
Copy link
Copy Markdown
Contributor Author

@vieux We have to figure out what we want to do in order to handle sparse files.

We could let this code compute the size by looking at the sizes of the files and add another feature which computes the used disk space. Reporting the amount of used disk space and the total size of the files was something which came up yesterday during the Docker meeting.

The issue is that we can't tell apart a sparse file from a compressed file. We can detect compression, but that doesn't help because we still end up with compressed files and compressed sparse files which we can't tell apart. Detecting compression would only help us figure out that we can't say that used space is almost equal to the total space those files would take as an uncompressed tarball.

The downside of this PR is that we won't be taking sparse files into account and we'll add up their maximum possible size. Since the existing code isn't taking sparse files into account either, changing it to take hard links into account is reasonable.

For now, we could just use this improvement to detect hard links and account for them. I'll add the same code for container sizes and then we can merge this.

@vieux I'll ping you when it's ready for merging.

@jpetazzo
Copy link
Copy Markdown
Contributor

@unclejack I don't know if you saw my comment; but I was wondering with what kind of compression you were using (so I can do some tests on my own)?

@unclejack
Copy link
Copy Markdown
Contributor Author

@jpetazzo I've seen your comment. It's just that I've done some research and it doesn't look like there's any simple way to detect sparse files when compression is involved. Let's discuss on IRC about this.

@jpetazzo
Copy link
Copy Markdown
Contributor

@unclejack I couldn't get a hold on you on IRC :) can you just tell me which kind of compression you're using? I just want to experiment a bit here. Thanks!

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Nov 1, 2013

ping @unclejack

@unclejack
Copy link
Copy Markdown
Contributor Author

@vieux We've discussed about this on IRC. @jpetazzo will try to do some tests when he has the time.

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Nov 11, 2013

@jpetazzo did you get the chance to work on this ?

@unclejack
Copy link
Copy Markdown
Contributor Author

@vieux Can you try this out, please? I've updated the code to be much faster and to make the same changes to GetSize() . However, some tests are failing randomly.

This doesn't handle sparse files because they can't also be handled on a FS with compression.

Comment thread container.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use uint8 here I see you used it like a bool? to have something small ?

bool and uint8 have the same size: http://play.golang.org/p/-sSaAiO49V

Please use a bool, I think it's mare easy to read.

This change makes docker compute layer size correctly.

The old code isn't taking hard links into account. Layers could
seem like they're up to 1-1.5x larger than they really were.
This change makes docker compute container size correctly.

The old code isn't taking hard links into account. Containers could
seem like they're up to 1-1.5x larger than they really were.
@unclejack
Copy link
Copy Markdown
Contributor Author

@vieux Thank you for the suggestion, I've updated the code. TestAttachDetach is still failing, though.

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Nov 19, 2013

@unclejack it happens sometimes, see #2771

LGTM

ping @crosbymichael

@crosbymichael
Copy link
Copy Markdown
Contributor

LGTM

crosbymichael added a commit that referenced this pull request Nov 22, 2013
Fix layer size computation: handle hard links correctly
@crosbymichael crosbymichael merged commit f7c2a00 into moby:master Nov 22, 2013
@unclejack unclejack deleted the fix_layer_size_computation branch November 22, 2013 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants