Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildbot jobs fail intermittently with "exceptions.Exception: Actual commit (…) differs from requested commit (…)" #18338

Closed
SimonSapin opened this issue Sep 1, 2017 · 9 comments

Comments

@SimonSapin
Copy link
Member

This is happening a lot lately. As in, every PR needs to be retried multiple times. CC @edunham @larsbergstrom @Manishearth

Example:

#18336 (comment)

http://build.servo.org/builders/mac-rel-wpt1/builds/5713

screenshot from 2017-09-01 06-03-21

err.text:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1445, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/dist-packages/buildbot/process/buildstep.py", line 322, in startStep
    result = yield self.run()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1445, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
    result = g.send(result)
  File "factories.py", line 39, in run
    got_rev, rev
exceptions.Exception: Actual commit (c6758e38ff77f706dd6cc2edc19854b03b285491) differs from requested commit (8c12d5bbc433aedb486b541ca925dd4065864aba)
@SimonSapin SimonSapin changed the title Buildbot fails with "exceptions.Exception: Actual commit (…) differs from requested commit (…)" Buildbot jobs fail intermittently with "exceptions.Exception: Actual commit (…) differs from requested commit (…)" Sep 1, 2017
@nox
Copy link
Contributor

nox commented Sep 1, 2017

13:52 <•SimonSapin> #18338 is being a pain
13:52 <crowbot> Issue #18338: Buildbot jobs fail intermittently with "exceptions.Exception: Actual commit (…) differs from requested commit (…)" - #18338
13:52 <nox> SimonSapin: AFAIK it's a garbage lockfile somewhere.
13:53 <•SimonSapin> git lock?
13:53 <nox> SimonSapin: http://build.servo.org/builders/mac-rel-wpt1/builds/5713/steps/git/logs/stdio
13:53 <nox> SimonSapin: Yes.
New messages
13:53 <nox> SimonSapin: I suspect this happened because we forced some PRs,
13:53 <nox> and forcing stuff just stops the builds without cleaning up.
13:53 <•SimonSapin> comment in the issue?
13:53 <nox> So if you force during a Git checkout, nothing works anymore afterwards.
13:53 <nox> Imma copypasta

@larsbergstrom
Copy link
Contributor

larsbergstrom commented Sep 1, 2017

There are two issues here:

  1. the first step git update should not be green - there is this line in there:
fatal: Unable to create '/Users/servo/buildbot/slave/mac-rel-wpt1/build/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

I'm assuming this is the result of a force or otherwise cancelled build that needs to be cleaned up better.

  1. It can be fixed manually with something like I did in:
    servo-mac7 can't complete git checkout saltfs#680 (comment)

at the risk of "wow, that seems dangerous" I believe that admins can do something like:

salt servo-mac* cmd.run 'find /Users/servo/buildbot/slave -name index.lock | xargs rm'

And fix it up for the whole cluster. Or burn it all to the ground :-)

cc @edunham

@larsbergstrom
Copy link
Contributor

@aneeshusa @edunham Any reason you can think of for this behavior? Can't cmd.run right now:

root@servo-master1:~# salt 'servo-mac1' test.ping
servo-mac1:
    True
root@servo-master1:~# salt 'servo-mac1' cmd.run 'find /Users/servo/buildbot/slave index.lock | xargs echo'
servo-mac1:
    Minion did not return. [No response]
root@servo-master1:~# 

@aneeshusa
Copy link
Contributor

Works for me right now:

root@servo-master1:~# salt 'servo-mac*' cmd.run 'echo hi'
servo-mac2:
    hi
servo-mac4:
    hi
servo-mac3:
    hi
servo-mac8:
    hi
servo-mac9:
    hi
servo-mac7:
    hi
servo-mac5:
    hi
servo-mac6:
    hi
servo-mac1:
    hi
root@servo-master1:~#

@aneeshusa
Copy link
Contributor

aneeshusa commented Sep 1, 2017

Depending on how long moving to Taskcluster takes (e.g. #17580, servo/saltfs#559 and co), we may want to fork Buildbot and write/backport a better git plugin that's less prone to these issues or teach Buildbot to clean the git lock after a force.

@aneeshusa
Copy link
Contributor

Also, I think a slightly safter version of your command would be

root@servo-master1:~# salt 'servo-mac*' cmd.run 'find /Users/servo/buildbot/slave -path "*/.git/index.lock"'

followed by

root@servo-master1:~# salt 'servo-mac*' cmd.run 'find /Users/servo/buildbot/slave -path "*/.git/index.lock" -delete'

You didn't pass a -name or -path before, so that might just delete everything...

@larsbergstrom
Copy link
Contributor

Aha, nice tweaks - I had noticed the -name thing before running it, luckily :-)

I just did:

ssh [email protected] 'find /Users/servo/buildbot/slave -name index.lock | xargs rm'

For now, which is less than awesome. I didn't know about the -delete arg - that's fantastic!

@highfive
Copy link

cc @aneeshusa

@jdm
Copy link
Member

jdm commented Jan 23, 2018

@jdm jdm closed this as completed Jan 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants