-
-
Notifications
You must be signed in to change notification settings - Fork 827
Description
This night all my backups failed, due to some kind of deadlock. I have three servers, one contains the borg repository, the other two servers push their backups into that (=the same) repository. I do hourly backups, the diff between the hourly backups is usally just about 9 MB or something and normally they are really fast. The setup worked well for about 1 week.
To avoid races, I schedule my hourly cron with sleep $(jot -r 1 1 600) && backup.sh, which generates a random delay from 0 to 10 minutes. Additionally, I have set the --lock-wait flag to 1800. This night a got the following emails:
Server:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote: Borg 1.0.0: exception in RPC call:
Remote: Traceback (most recent call last):
Remote: File "/usr/local/lib/python3.4/site-packages/borg/remote.py", line 94, in serve
Remote: res = f(*args)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 429, in put
Remote: self.prepare_txn(self.get_transaction_id())
Remote: File "/usr/local/lib/python3.4/site-packages/borg/repository.py", line 178, in prepare_txn
Remote: self.lock.upgrade()
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 298, in upgrade
Remote: self.acquire(exclusive=True, remove=SHARED)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 257, in acquire
Remote: self._wait_for_readers_finishing(remove, sleep)
Remote: File "/usr/local/lib/python3.4/site-packages/borg/locking.py", line 286, in _wait_for_readers_finishing
Remote: raise LockTimeout(self.path)
Remote: borg.locking.LockTimeout: /var/borg/backup/lock
Remote: Platform: FreeBSD korriban 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Remote: Borg: 1.0.0 Python: CPython 3.4.4
Remote: PID: 11415 CWD: /var/borg
Remote: sys.argv: ['/usr/local/bin/borg', 'serve', '--umask=077']
Remote: SSH_ORIGINAL_COMMAND: None
Remote:
Remote Exception (see remote log for the traceback)
Platform: FreeBSD coruscant 10.2-RELEASE-p14 FreeBSD 10.2-RELEASE-p14 #0: Wed Mar 16 20:46:12 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 amd64
Borg: 1.0.0 Python: CPython 3.4.4
PID: 59110 CWD: /root
sys.argv: ['/usr/local/bin/borg', 'create', '-C', 'lz4', '--lock-wait', '1800', '[email protected]:/var/borg/backup::coruscant-2016-03-30-010210-CEST', '/usr/home', '/etc', '/usr/local/etc']
SSH_ORIGINAL_COMMAND: None
Client 1
Failed to create/acquire the lock /root/.cache/borg/018468490212573b7a255fdbeadc052964eb2fee8e0446b79abc7fead592bfb2/lock.exclusive (timeout).
Client 2
Failed to create/acquire the lock /root/.cache/borg/018468490212573b7a255fdbeadc052964eb2fee8e0446b79abc7fead592bfb2/lock.exclusive (timeout).
The server has too much borg processes left:
$ ps aux | grep borg
borg 11410 1,5 0,9 153684 38712 - Ss 1:02am 5:18,73 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root 11407 0,0 0,2 86496 7468 - Is 1:02am 0:00,05 sshd: borg [priv] (sshd)
borg 11409 0,0 0,2 86496 10304 - I 1:02am 0:00,24 sshd: borg@notty (sshd)
root 25779 0,0 0,2 86496 7460 - Is 5:08am 0:00,05 sshd: borg [priv] (sshd)
borg 25781 0,0 0,2 86496 7596 - I 5:08am 0:00,01 sshd: borg@notty (sshd)
borg 25782 0,0 0,9 149588 38024 - Is 5:08am 2:22,57 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root 35819 0,0 0,2 86496 7468 - Is 8:00am 0:00,05 sshd: borg [priv] (sshd)
borg 35821 0,0 0,2 86496 7480 - I 8:00am 0:00,01 sshd: borg@notty (sshd)
borg 35822 0,0 0,9 149332 37648 - Ss 8:00am 0:01,22 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
root 36103 0,0 0,2 86496 7460 - Is 8:08am 0:00,05 sshd: borg [priv] (sshd)
borg 36105 0,0 0,2 86496 7472 - I 8:08am 0:00,01 sshd: borg@notty (sshd)
borg 36106 0,0 0,9 149332 37632 - Ss 8:08am 0:01,22 /usr/local/bin/python3.4 /usr/local/bin/borg serve --umask=077
stefan 37298 0,0 0,1 18824 2564 1 S+ 8:28am 0:00,01 grep borg
It seems that it deadlocked very badly; any ideas what went wrong? Is this a bug or misconfiguration?