Skip to content

Conversation

@amitiuttarwar
Copy link
Contributor

The scheduler_tests/mockforward test introduced in #18037 sometimes fails on the x86_64 Linux machine with no wallet.

This is an attempt to fix by changing the order of events in the tests & starting the scheduler thread before queueing events.

Unfortunately I'm unable to reproduce locally, so I'm opening this PR & would like to get a few green runs to see if this is actually the fix.

@DrahtBot DrahtBot added the Tests label Feb 18, 2020
@theStack
Copy link
Contributor

Just as a general hint for the approach of "get a few green runs", it's also possible to register at travis-ci.org with your github account and add your forked bitcoin repository there: then on every push the CI will be triggered as well. This proved to be very helpful for me personally, as check before I opened PRs, as I wrongly assumed for some reason that the CI runs can only be triggered here on the main bitcoin repository.

@amitiuttarwar
Copy link
Contributor Author

thanks for the tip @theStack !

so you were able to get travis passing on your forked repo? when I tried previously I wouldn't get meaningful results, eg. they would frequently time out or error for unrelated reasons. But I didn't investigate too closely, so I can take another look

@theStack
Copy link
Contributor

so you were able to get travis passing on your forked repo? when I tried previously I wouldn't get meaningful results, eg. they would frequently time out or error for unrelated reasons. But I didn't investigate too closely, so I can take another look

It could be that on a first build you have to restart the job manually from the Travis dashboard, if a message like this appears at the very bottom of the job log:

Error! Initial build successful, but not enough time remains to run later build stages and tests.
See https://docs.travis-ci.com/user/customizing-the-build#build-timeouts . Please manually re-run
this job by using the travis restart button. The next run should not time out because the build cache
has been saved.

Other than that, after this initial hurdle I think a Travis run here should behave the same as a run from your bitcoin fork -- at least that's my experience :)

std::thread scheduler_thread([&]() { scheduler.serviceQueue(); });

// bump the scheduler forward 5 minutes
scheduler.MockForward(boost::chrono::seconds(5*60));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this line (or any line after it, but before the next BOOST_CHECK) still fails. E.g.

Running 3 test cases...

Test cases order is shuffled using seed: 902263820

Entering test module "Bitcoin Core Test Suite"

test/scheduler_tests.cpp(11): Entering test suite "scheduler_tests"

test/scheduler_tests.cpp(156): Entering test case "mockforward"

terminate called after throwing an instance of 'boost::wrapexcept<boost::condition_error>'

  what():  boost::condition_variable::do_wait_until failed in pthread_cond_timedwait: Invalid argument

unknown location(0): fatal error: in "scheduler_tests/mockforward": signal: SIGABRT (application abort requested)

test/scheduler_tests.cpp(173): last checkpoint

test/scheduler_tests.cpp(156): Leaving test case "mockforward"; testing time: 430us

test/scheduler_tests.cpp(112): Entering test case "singlethreadedscheduler_ordered"

test/scheduler_tests.cpp(112): Leaving test case "singlethreadedscheduler_ordered"; testing time: 7233us

test/scheduler_tests.cpp(38): Entering test case "manythreads"

test/scheduler_tests.cpp(38): Leaving test case "manythreads"; testing time: 8082us

test/scheduler_tests.cpp(11): Leaving test suite "scheduler_tests"; testing time: 15837us

Leaving test module "Bitcoin Core Test Suite"; testing time: 15956us

@maflcko
Copy link
Member

maflcko commented Feb 18, 2020

I tried on three machines and could not reproduce the issue locally

@maflcko
Copy link
Member

maflcko commented Feb 19, 2020

@maflcko
Copy link
Member

maflcko commented Feb 19, 2020

@jonasschnelli Can the build be reproduced locally? Is it using depends or system boost?

@jonasschnelli
Copy link
Contributor

I looked a bit into it.
This PR seems still to fail on bitcoinbuilds.org:
https://bitcoinbuilds.org/index.php?job=2f3c8572-cbaf-4cab-a2d2-20809d713084

What I observed was that only the depends-build configuration fail (system-libs configuration works). If I clear the ccache-cache, the build runs successful.

Maybe that helps identifying the issue.

@maflcko
Copy link
Member

maflcko commented Feb 19, 2020

If I clear the ccache-cache, the build runs successful.

I think the failure is intermittent, so it might be hard to draw conclusions early.

@jonasschnelli
Copy link
Contributor

I think the failure is intermittent, so it might be hard to draw conclusions early.

It seems to fail on bitcoinbuilds.org 100% on "Linux x86_64 depends" "Linux 32 depends" when using ccache/dependency cache.

@maflcko
Copy link
Member

maflcko commented Feb 20, 2020

I am finally able to reproduce on a single CPU instance on gce n1-standard-1 (1 vCPU, 3.75 GB memory)

I am using depends with NO_QT=1 NO_WALLET=1 and ccache. The bug is only reproducible once after each reboot with: cd bitcoin/ && while sudo bash -c 'rm -f ./src/test/test_bitcoin && (echo 1 > /proc/sys/vm/drop_caches) && make -j 2 check' ; do true ; done

I could not reproduce it when attaching gdb, running in valgrind or when enabling core dumps (in combination with #18183 )

@practicalswift
Copy link
Contributor

@jonasschnelli @MarcoFalke I'm unable to reproduce :(

What toolchain are you compiling using? Is there any theory as to why the use of ccache would matter?

@practicalswift
Copy link
Contributor

Also, are you able to trigger it when doing src/test/test_bitcoin -t scheduler_tests?

@fanquake fanquake closed this in 4502ed7 Feb 27, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 27, 2020
…r now

fab2527 test: Disable mockforward scheduler unit test for now (MarcoFalke)

Pull request description:

  This should be a workaround to fix bitcoin#18174 in the short run and buy us more time to investigate the issue while ci runs are green again 🙏

ACKs for top commit:
  fanquake:
    ACK fab2527 - be good to get Travis back.
  laanwj:
    ACK fab2527

Tree-SHA512: 027e86b3dfec203a464e5bf528e9933c208c36633c2d4bfcdbc10da1799637a5d6ea0a63af33a4174fb1ad7115df631a4cb838f56e31f4cbd15498e1e9fdf9cc
@amitiuttarwar amitiuttarwar deleted the 2020-02-fix-scheduler-test branch February 28, 2020 20:21
@amitiuttarwar
Copy link
Contributor Author

thank you all for your help with this bug !

I opened issue #18227 to track findings & continue investigation

sidhujag pushed a commit to syscoin-core/syscoin that referenced this pull request Nov 10, 2020
…r now

fab2527 test: Disable mockforward scheduler unit test for now (MarcoFalke)

Pull request description:

  This should be a workaround to fix bitcoin#18174 in the short run and buy us more time to investigate the issue while ci runs are green again 🙏

ACKs for top commit:
  fanquake:
    ACK fab2527 - be good to get Travis back.
  laanwj:
    ACK fab2527

Tree-SHA512: 027e86b3dfec203a464e5bf528e9933c208c36633c2d4bfcdbc10da1799637a5d6ea0a63af33a4174fb1ad7115df631a4cb838f56e31f4cbd15498e1e9fdf9cc
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Apr 20, 2022
…r now

fab2527 test: Disable mockforward scheduler unit test for now (MarcoFalke)

Pull request description:

  This should be a workaround to fix bitcoin#18174 in the short run and buy us more time to investigate the issue while ci runs are green again 🙏

ACKs for top commit:
  fanquake:
    ACK fab2527 - be good to get Travis back.
  laanwj:
    ACK fab2527

Tree-SHA512: 027e86b3dfec203a464e5bf528e9933c208c36633c2d4bfcdbc10da1799637a5d6ea0a63af33a4174fb1ad7115df631a4cb838f56e31f4cbd15498e1e9fdf9cc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants