Skip to content

Migration optimizations#18399

Merged
crosbymichael merged 2 commits intomoby:masterfrom
tonistiigi:migration-optimization
Jan 5, 2016
Merged

Migration optimizations#18399
crosbymichael merged 2 commits intomoby:masterfrom
tonistiigi:migration-optimization

Conversation

@tonistiigi
Copy link
Member

This PR contains efforts to speed up migration to content addressability. In my testcase the improvement is ~55%.

Layers checksum calculation is separated from migration process so it can be ran on parallel(and in a separate tool). Also changes it so that cancelled migration doesn't start from the beginning again.

Other optimizations include:

Most of the improvement is for the tar-split migration. If you have lots of self-built images the improvement is mostly only from parallel processing.

Current state of the external migration tool can be seen in https://github.com/tonistiigi/docker-v1.10-migrator . Please note that I have only manually tested this tool atm.

@aaronlehmann @LK4D4

daemon/daemon.go Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to be more specific in this message to specify what's being migrated and why.

And I'd suggest putting migration log messages at a different log level than "debug", so people can more readily see what's going on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration logs are at info level. Only the timing stats are on debug level.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing stats could be useful to have at the info level if we're trying to track down performance issues with migration. The log line should only ever be displayed once, so I don't think it would be overly verbose to have it at the info level.

@tonistiigi tonistiigi force-pushed the migration-optimization branch from 207ab1f to 8ccbb4b Compare December 3, 2015 19:06
@aaronlehmann
Copy link

Did some light testing and it's working for me.

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this up here? It was originally done inside the transaction after data files has been written to decrease the amount of time this lock needs to be held.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would change getWithoutLock to get. And take the lock again before Commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just keep this whole block where it was. The case where the image already exists does mean that we have unnecessarily written out the metadata files but the transaction will clean that up. I would argue it is better to be optimistic here by writing out the files then potentially throwing them away at the end to reduce the time the layer lock is held. Acquiring the lock twice to avoid this just does not seem necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is actually never called in parallel so these locks don't block in migration. Still, it should be correct of course. Copying/writing files when I already know that I won't be using them does not sound like a good idea, especially when this PR is trying to optimize things. I'm fine for the second lock for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If when this is run there will not be contention for the adding layers then I am fine with leaving it like this. Two locks mean two existence checks and that is not ideal.

@dmcgowan
Copy link
Member

dmcgowan commented Dec 4, 2015

Tested migrating with cancellation and restart.

LGTM

layer/layer.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer if we put the variable name here too

@tonistiigi tonistiigi force-pushed the migration-optimization branch from 8ccbb4b to 29dc425 Compare December 4, 2015 21:28
@tonistiigi
Copy link
Member Author

I decided to revert the "direct filesystem access for tar-split in overlay" ba9442574ed8bf8445681ae5a14b2fd486d2c85a , looking at the overlay implementation more closely it didn't really provide much benefits as the graphdriver can itself recognize the case where it doesn't need to mount. It adds a lock(in driver.Get) but is more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we close this data file? I don't see a call to Close anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@tonistiigi tonistiigi force-pushed the migration-optimization branch from 29dc425 to ee5d960 Compare December 11, 2015 00:16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a utility function rather than adding a bool arg here?

so:

w, err := fm.TarSplitWriter()
w = WithCompression(w)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight problem with this is that the compressed version is the default that should be always used. Uncompressed (bool=truefalse) is the hack needed for migration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so the default would be to use only fm.TarSplitWriter() and for the migration use the helper fn.

Below we'd move the close wrapper to the helper, since that's only needed because of the compression, and return f directly (as an io.WriteCloser).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional args!

type TarSplitOption func(fm *fileMetadataTransaction)

func (fm *fileMetadataTransaction) TarSplitWriter(options ...TarSplitOption) (io.WriteCloser, error) {
    for _, opt := range options {
        opt(fm)
    }
    ...
}

func CompressInput(fm *fileMetadataTransaction) {
    fm.compressInput = true
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mixed it up. bool=false is the exception for the migration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the issue with bool arguments 😈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuguy83 I can rewrite it the way @aaronlehmann suggested if you find that cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @cpuguy83 ^^ 😇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with it as is. No need to make such a change for a single option.

@tonistiigi tonistiigi force-pushed the migration-optimization branch 3 times, most recently from df11dae to 46e82f7 Compare December 18, 2015 17:18
@tonistiigi tonistiigi force-pushed the migration-optimization branch from 46e82f7 to a8f88ef Compare January 4, 2016 18:05
@thaJeztah thaJeztah added this to the 1.10.0 milestone Jan 5, 2016
@cpuguy83
Copy link
Member

cpuguy83 commented Jan 5, 2016

Changes LGTM, tested pretty well.
Took 4 mins for about 800 image layers representing ~17GB.
Also tested killing and restarting the daemon during migration.

@thaJeztah
Copy link
Member

ping @tianon - I think you had a nice big image cache to test this on?

@tianon
Copy link
Member

tianon commented Jan 5, 2016

I don't have it handy (it's already long-since migrated now), but the script at https://gist.github.com/tianon/87de734e308da031fb46 should be enough to get a group of images of a sufficient size to be reasonably close to what I was testing with (I had a few more images than what this script will pull down, but probably not enough to be "statistically significant").

ie, curl -fsSL 'https://gist.githubusercontent.com/tianon/87de734e308da031fb46/raw/3b40ba55240e947cbb2d8ab1d6d3b0f42a7a0eb2/docker-pull-official.sh' | bash

@thaJeztah
Copy link
Member

@tianon cool! Perhaps we should include that script somewhere in the repository, and keep it for benchmarking?

@crosbymichael
Copy link
Contributor

LGTM

crosbymichael added a commit that referenced this pull request Jan 5, 2016
@crosbymichael crosbymichael merged commit 5aae5a5 into moby:master Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.