Automatically remove stale locks#1674
Conversation
Current coverage is 84.37% (diff: 76.00%)@@ master #1674 diff @@
==========================================
Files 20 20
Lines 6609 6664 +55
Methods 0 0
Messages 0 0
Branches 1125 1136 +11
==========================================
+ Hits 5582 5623 +41
- Misses 754 767 +13
- Partials 273 274 +1
|
9ea8a64 to
487b5d1
Compare
src/borg/remote.py
Outdated
| raise ValueError('log level missing, fix this code') | ||
| env_vars = [] | ||
| if bool(os.environ.get('BORG_UNIQUE_HOSTNAME')): | ||
| env_vars.append('BORG_UNIQUE_HOSTNAME=yes') |
There was a problem hiding this comment.
a while ago I changed semantics of env vars. in early borg and attic, non-present env vars meant False and any present ones meant True (even if e.g. FOO=False or FOO=0 was used - they would be evaluated as truish).
after my change, env vars are handled like yes() input. So, it is FOO=0 and FOO=1 or FOO=yes and FOO=no now.
So, I think it should be:
unique_hostname = os.environ.get('BORG_UNIQUE_HOSTNAME')
if unique_hostname is not None:
env_vars.append('BORG_UNIQUE_HOSTNAME=%s' % unique_hostname)
There was a problem hiding this comment.
Ok, there are some other places that do bool(os.environ.get('B_U_H')), I'll change those as well for consistent behaviour.
There was a problem hiding this comment.
Done. I added prompt=True|False to yes() to facilitate this. Output now looks like this:
Remote: Enabled removal of stale repository locks
Remote: Killed stale lock tiger.3300-0.
Remote: Removed stale exclusive roster lock for pid 3300.
Remote: Removed stale exclusive roster lock for pid 3300.
Enabled removal of stale cache locks
Killed stale lock tiger.3292-0.
Removed stale exclusive roster lock for pid 3292.
Removed stale exclusive roster lock for pid 3292.
The doubled roster locks are because this is done on loading, so if it's not immediately saved that gets doubled. I think we could just move those down to INFO log level, or remove them.
| remote_path = args.remote_path or os.environ.get('BORG_REMOTE_PATH', 'borg') | ||
| remote_path = replace_placeholders(remote_path) | ||
| return [remote_path, 'serve'] + opts | ||
| return env_vars + [remote_path, 'serve'] + opts |
There was a problem hiding this comment.
as this is no-cover: did you test this manually, using a ssh: repo location?
| dead_lock = ExclusiveLock(lockpath, id=dead_id).acquire() | ||
| with ExclusiveLock(lockpath, id=our_id, kill_stale_locks=True): | ||
| with pytest.raises(NotMyLock): | ||
| dead_lock.release() |
There was a problem hiding this comment.
69 can only acquire the lock, if there is no other lock for lockpath.
but there is dead_lock. as it is dead, and kill_stale_locks, it gets killed.
but why does 71 raise NotMyLock then and not NotLocked?
and why is it allowed to enter the with-block if the lock is still considered present?
There was a problem hiding this comment.
NotMyLock because there is an active lock and it belongs to 69. NotLocked would be the result if we de-indentend 70/71 one level, in which case 69 would be unlocked, and since 68 is dead, there is no lock = NotLocked.
a0345af to
6b88da2
Compare
| # the lock made by the parent, so it needs to use the same PID for that. | ||
| _pid = os.getpid() | ||
| # XXX this sometimes requires live internet access for issuing a DNS query in the background. | ||
| _hostname = socket.gethostname() |
There was a problem hiding this comment.
I don't think there's a way around that API-wise, if the machine is configured that way.
|
Feature-wise this is done. Output could use some polishing perhaps:
|
src/borg/cache.py
Outdated
| self.manifest = manifest | ||
| self.path = path or os.path.join(get_cache_dir(), repository.id_str) | ||
| self.unique_hostname = yes(true_msg='Enabled removal of stale cache locks', | ||
| env_var_override='BORG_UNIQUE_HOSTNAME', prompt=False, env_msg=None) |
There was a problem hiding this comment.
BORG_UNIQUE_HOSTNAME is a bit ambiguous / unintuitive maybe.
It is meant as boolean, but from the name, it could also be meant as a var holding a hostname.
BORG_HOSTNAME_IS_UNIQUE maybe?
There was a problem hiding this comment.
this also affects the python variable names.
|
|
||
| return local_pid_alive(pid) | ||
|
|
||
| def local_pid_alive(pid): |
There was a problem hiding this comment.
pep8: add one more blank line above this one.
src/borg/platform/posix.pyx
Outdated
| try: | ||
| # This doesn't work on Windows. | ||
| # This does not kill anything, 0 means "see if we can send a signal to this process or not". | ||
| # Possible errors: No such process (== stale lock) or permission denied (not a stale lock) |
There was a problem hiding this comment.
add a dot at end of sentence.
src/borg/platform/posix.pyx
Outdated
| if err.errno == errno.ESRCH: | ||
| # ESRCH = no such process | ||
| return False | ||
| # Any other error (eg. permissions) mean that the process ID refers to a live process |
There was a problem hiding this comment.
... means ... (and add a dot at the end).
|
Note: I reviewed the log levels and they seem ok as they are. Users should get notified if there are stale locks because this points to a problem in their setup. Not sure though if we always want to read the "Enabled removal ..." msg from yes() all the time. |
|
They're quick and distinct feedback that it worked / is set up correctly (on both ends). I guess we could move it to INFO level, moving the emission of the message from yes() to the call site. I'll apply that and the other suggestions you made. |
|
squash, merge? |
ThomasWaldmann
left a comment
There was a problem hiding this comment.
yes, squash some, merge.
If BORG_UNIQUE_HOSTNAME shell variable is set, stale locks in both cache and repository are deleted. Stale lock is defined as a lock that's originating from the same hostname as us, and correspond to a pid that no longer exists. This fixes borgbackup#562
(testability)
10a6771 to
676e69c
Compare
|
@verygreen :) |
Supersedes #1246
Fixes #562
Edit travis #3000 :)