Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glitch with queue of "nop" jobs #2419

Closed
davidstrauss opened this issue Jan 23, 2016 · 3 comments
Closed

Glitch with queue of "nop" jobs #2419

davidstrauss opened this issue Jan 23, 2016 · 3 comments
Labels
bug 🐛 Programming errors, that need preferential fixing needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer pid1

Comments

@davidstrauss
Copy link
Member

We're seeing an issue with "nop" jobs going into a zombie state where the systemctl try-restart that requested them hangs indefinitely but PID 1 never gets around to working on them. We see them linger in systemctl list-jobs for hours despite thousands of other jobs completing that were scheduled later. 100% of the jobs we see stuck in this state are of TYPE=nop and STATE=waiting.

Here's what it looks like:

[straussd@endpoint3f149a30:~]$ sudo systemctl list-jobs
   JOB UNIT                                             TYPE STATE  
219201 php_fpm_f70026204e604a198ca10273335f81da.service nop  waiting
 76829 php_fpm_debc635ae2734d8c80cd49807b95ac92.service nop  waiting
219039 nginx_766347f19d5a471cb5f501a608fea4b8.service   nop  waiting
215907 php_fpm_93ca5b14f589492cabd7b71c74e64972.service nop  waiting
203488 php_fpm_089a6a3700dc474d976ec348d5fba3ff.service nop  waiting
215829 php_fpm_0f1c94133434405fb573225f35145f31.service nop  waiting
219040 nginx_e6b6696b3cd04fc996cd005f4ffeb132.service   nop  waiting
216834 nginx_1c327d05a6244551a0403f613a419977.service   nop  waiting

As you can see, systemd PID 1 isn't even doing anything, but these jobs are still waiting. We cannot reproduce this issue on Fedora 20 (based on v208), but we see it frequently on Fedora 22 (v219).

@davidstrauss davidstrauss added the bug 🐛 Programming errors, that need preferential fixing label Jan 23, 2016
@davidstrauss
Copy link
Member Author

As requested on IRC, here's the status of a service related to a stuck job:

[straussd@endpoint3f149a30:~]$ sudo systemctl status php_fpm_799ee45c339845bf9cc967c5936d59ff.service
● php_fpm_799ee45c339845bf9cc967c5936d59ff.service - PHP-FPM server for site=252d6314-b0f5-41ba-905c-1530d8f34b9b environment=test binding=799ee45c339845bf9cc967c5936d59ff php_dir=/opt/pantheon/php-5.5.24-20150427 service_level=pro uri=test-REDACTED.pantheon.io
   Loaded: loaded (/etc/systemd/system/php_fpm_799ee45c339845bf9cc967c5936d59ff.service; static; vendor preset: disabled)
   Active: inactive (dead)

@poettering
Copy link
Member

Hmm, weird. Any idea how to reproduce this? Did you issue a lot of "daemon-reload"s at the same time as systemd was processing jobs?

We generally only track bugs in the two most recent systemd versions upstream. Any chance you can reproduce it was something more recent?

@poettering poettering added pid1 needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Jan 27, 2016
@keszybz
Copy link
Member

keszybz commented Dec 12, 2016

Closing because of lack of feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer pid1
Development

No branches or pull requests

3 participants