Skip to content

attic to borg one time converter#231

Merged
ThomasWaldmann merged 63 commits intoborgbackup:masterfrom
anarcat:attic-converter
Oct 3, 2015
Merged

attic to borg one time converter#231
ThomasWaldmann merged 63 commits intoborgbackup:masterfrom
anarcat:attic-converter

Conversation

@anarcat
Copy link
Contributor

@anarcat anarcat commented Oct 1, 2015

for now, just in the test suite, but will be migrated to a separate command.

currently converts segments, which may be enough for unencrypted repositories. will require a cache rebuild.

to be continued.

see #21.

update: converter seems to work, testing would be appreciated.

for now, just in the test suite, but will be migrated to a separate command
@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

obviously, tests fail because i rely on attic being present there. not sure how i can fix that, but for now, i prefer to continue working with tests than with my real repo, thank you very much. :p

@codecov-io
Copy link

Current coverage is 83.07%

Merging #231 into master will decrease coverage by -0.07% as of 96696ab

@@            master    #231   diff @@
======================================
  Files           29      31     +2
  Stmts         6507    6683   +176
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           5410    5552   +142
  Partial          0       0       
- Missed        1097    1131    +34

Review entire Coverage Diff as of 96696ab


Uncovered Suggestions

  1. +0.63% via borg/fuse.py#124...165
  2. +0.51% via borg/xattr.py#199...232
  3. +0.44% via borg/xattr.py#166...194
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

On 2015-09-30 23:40:11, Codecov wrote:

[Current coverage][1] is 82.39%

this stuff is really noisy...

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

this is pretty much ready now. the only thing that is not done here is the cache conversion, for which i am somewhat too lazy right now - but the existing code should be a good example of how it can be done.

i have only tested this with the unit tests, so i have no idea if it really works. i'm in the process of making a copy of my attic repo to try it out on real data (~460GB repo with daily snapshots since december 2014, so around 280 snapshots), which will in itself take a few hours - so i can't confirm until tomorrow that any of this really works.

but the converter should be pretty fast, O(n) where n is the number of segments, and only a small write is done for each segment, so it can be very fast.

the converter also assumes some compatibility between borg and attic. for example, it assumes it can load an attic repository and list the segments as if it was a borg repo (which actually works right now). reimplementing this so that it still works if borg changes too much shouldn't be much of a problem. in fact, i did that for the keys discovery, for example.

oh, and regarding unit testing - i don't quite get it: most of the code i wrote is unit-tested, in fact, it's how i wrote it... maybe someone familiar with codecov can explain to me what i did wrong?

@RonnyPfannschmidt
Copy link
Contributor

The convert test is skipped on CI due to not having attic in a tox env
I'd suggest adding a tox factor and a single tox env to test this

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

sorry, i'm not familiar with tox - what does that mean?

@anarcat
Copy link
Contributor Author

anarcat commented Oct 1, 2015

this way we don't depend on attic for regular build, but we can still see proper test coverage
@RonnyPfannschmidt
Copy link
Contributor

you would also need to change the travis matrix from the normal envs, to the envs with attic as dependency

given modern tools i might have been wong in suggesting to use a factor, since it can just be installed in tox in a normal env

@ThomasWaldmann any oppinion on that?

@ThomasWaldmann
Copy link
Member

@RonnyPfannschmidt What are the pros and cons of the methods?

borg/archiver.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use """triple-double-quoted""" docstrings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. but why? just above that, single-single quotes are used...

@anarcat
Copy link
Contributor Author

anarcat commented Oct 2, 2015

Guess the converter could fix that, but it would only delay the cache sync until it really gets out of sync the first time.

that's what i feared. is there any point in converting the existing cache then? it does take a bit of time that we could skip (a few seconds) to copy the ~1GB cache file from attic here...

@ThomasWaldmann
Copy link
Member

currently, there is not much point. BUT, if we realize the faster ideas for cache resync, we would need it (again). so maybe keep it for now.

@ThomasWaldmann
Copy link
Member

doing some testing of this code right now.

the progress indication has some issue:

$ borg convert -n /tmp/attic2b
reading segments from attic repository using borg
no key file found for repository
copying attic cache from /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/files to /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/files
copying attic cache from /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/chunks to /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/chunks
copying attic cache from /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/config to /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/config
converting cache b'/tmp/attic2b/index.28'
converting cache /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/files
converting cache /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/chunks
converting cache /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/config
converting 27 segments...
converting segment 27/27 in place, 1.00% done (/tmp/attic2b/data/0/28)

Then it ended. As it is not showing 100%, users might be confused whether it really did all it needed to do.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

hehehe... oops! i forgot to multiply by 100. :)

@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

try again now?

@ThomasWaldmann
Copy link
Member

percentage works now. just trying to convert the same src repo twice does not work, because cache files then already exist at the target directory. maybe it should ask for permission to clear an existing cache.

we separate the conversion and the copy in order to be able to copy
arbitrary files from attic without converting them. this allows us to
copy the config file cleanly without attempting to rewrite its magic
number
@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

@ThomasWaldmann what do you mean it "does not work"? does it crash? it should just produce a warning and move on, normally.

i have rewired the cache copy mechanism as well, but i'd appreciate if you could test it as well...

have any idea on how to integrate some cache generation in the unit tests? i couldn't figure it out looking at do_create()...

@ThomasWaldmann
Copy link
Member

I killed the attic2b repo from my first attempt and made a new copy from attic using cp -a attic attic2b.

Then:

$ borg convert /tmp/attic2b
reading segments from attic repository using borg
no key file found for repository
borg cache already exists in /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/files, skipping conversion of /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/files
borg cache already exists in /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/chunks, skipping conversion of /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/chunks
borg cache already exists in /home/tw/.cache/borg/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/config, skipping conversion of /home/tw/.cache/attic/0abe621081af58c84f34f740bdca19cb764130b012a81a13257b4d3e1a364e95/config
converting cache b'/tmp/attic2b/index.28'
converting 27 segments...
converting segment 27/27 in place, 100.00% done (/tmp/attic2b/data/0/28)

It stumbles over the cache as that is already there for this repoid.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

well, it's just a safety check: if you mistakenly run borg convert over an already existing borg report that has is not related to attic, you definitely don't want to overwrite those cache files! the solution here for you is to simply flush ~/.cache/borg/0abe6[...]d3e1a364e95/* by hand...

and the conversion still works, it's just the cache that is not being copied: you could simply remove the files and rerun the conversion again.

@ThomasWaldmann
Copy link
Member

yes, just seen the "config" handling, sorry.

if there's no attic cache, it's no use checking for individual files

this also makes the code a little clearer

also added comments
@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

just reshuffled and commented the code to make that clearer.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

@tw how did you create the attic2b repo in the first place? i am wondering if there wouldn't be an easy way to create an attic repo with a cache without replicating all of the archiver.py unit tests. :p

maybe it would be useful to have generic routines to create a bunch of files in the test suite...

convert is too generic for the Attic conversion: we may have other
converters, from other, more foreign systems that will require
different options and different upgrade mechanisms that convert could
never cover appropriately. we are more likely to use an approach
similar to "git fast-import" instead here, and have the conversion
tools be external tool that feed standard data into borg during
conversion.

upgrade seems like a more natural fit: Attic could be considered like
a pre-historic version of Borg that requires invasive changes for borg
to be able to use the repository. we may require such changes in the
future of borg as well: if we make backwards-incompatible changes to
the repository layout or data format, it is possible that we require
such changes to be performed on the repository before it is usable
again. instead of scattering those conversions all over the code, we
should simply have assertions that check the layout is correct and
point the user to upgrade if it is not.

upgrade should eventually automatically detect the repository format
or version and perform appropriate conversions. Attic is only the
first one. we still need to implement an adequate API for
auto-detection and upgrade, only the seeds of that are present for now.

of course, changes to the upgrade command should be thoroughly
documented in the release notes and an eventual upgrade manual.
this makes it clear how to start from scratch, in case the chunk cache
was failed to be copied and so on.
@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

renamed convert command to upgrade, as described in the last commit.

12:29:25 <anarcat> i'm keeping the convert_segments and so on semantics the same
12:29:47 <anarcat> but renaming AtticRepositoryUpgrader to AtticRepositoryConverter, and doing s/convert()/upgrade()/ as well
12:29:56 <anarcat> so that we have some meaningful API already

there will be some more work to be done for the upgrade command to be useful for other upgrades in the future, mostly internal API changes, but that can wait until after this is merged.

i have ran fakeroot -u tox on this and it seems happy. i have also performed a test conversion and it seems it went fine:

[1037]anarcat@marcos:borg(attic-converter $%>)$ attic init ../test/attic-test
Initializing repository at "../test/attic-test"
Encryption NOT enabled.
Use the "--encryption=passphrase|keyfile" to enable encryption.
Initializing cache...
[1038]anarcat@marcos:borg(attic-converter $%>)$ attic create ../test/attic-test::test-run .
[1039]anarcat@marcos:borg(attic-converter $%>)$ cp -a ../test/attic-test/ ../test/borg-test
[1042]anarcat@marcos:borg(attic-converter $%>)$ borg upgrade -n ../test/borg-test
reading segments from attic repository using borg
no key file found for repository
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/config to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/config
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks
converting cache b'/home/anarcat/src/test/borg-test/index.6'
converting cache /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files
converting cache /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks
converting 5 segments...
converting segment 5/5 in place, 100.00% done (/home/anarcat/src/test/borg-test/data/0/6)
[1043]anarcat@marcos:borg(attic-converter $%>)$ borg upgrade  ../test/borg-test
reading segments from attic repository using borg
no key file found for repository
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/config to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/config
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files
copying attic cache file from /home/anarcat/.cache/attic/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks to /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks
converting cache b'/home/anarcat/src/test/borg-test/index.6'
converting cache /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/files
converting cache /home/anarcat/.cache/borg/8bc4151a01b9c882bdad30e626db2d11f159af19cdc3a9b6d5101ae71515689b/chunks
converting 5 segments...
converting segment 5/5 in place, 100.00% done (/home/anarcat/src/test/borg-test/data/0/6)
[1044]anarcat@marcos:borg(attic-converter $%>)$ borg create ../test/borg-test::test-post-attic .
Warning: The repository at location ../test/borg-test was previously located at ../test/attic-test
Do you want to continue? [yN] y

and all is good. i may try to convert my main repo again now that #235 has a workaround, but that shouldn't keep this from happening, as copying the attic repo takes several hours here...

it seems the file cache does *not* have the ATTIC magic header (nor
does it have one in borg), so we don't need to edit the file - we just
copy it like a regular file.

while i'm here, simplify the cache conversion loop: it's no use
splitting the copy and the edition since the latter is so fast, just
do everything in one loop, which makes it much easier to read.
ThomasWaldmann added a commit that referenced this pull request Oct 3, 2015
attic to borg one time converter
@ThomasWaldmann ThomasWaldmann merged commit 1207e1a into borgbackup:master Oct 3, 2015
@anarcat anarcat deleted the attic-converter branch October 3, 2015 17:07
@anarcat
Copy link
Contributor Author

anarcat commented Oct 3, 2015

whoohoo! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants