-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG] Fix SGD non deterministic behavior #13422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Windows unit tests seem to fail, I need to investigate a bit |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I agree the nondeterminism is a problem
| dataset, intercept_decay = make_dataset(X, y_i, sample_weight) | ||
|
|
||
| # XXX should have random_state_! | ||
| random_state = random_state if random_state is not None \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to support retrieving the random_state from the estimator. Let's just use check_random_state and keep it simple.
4c4d02d to
a5fbfe6
Compare
|
@jnothman thanks, I made the change and used |
a5fbfe6 to
fe23a0c
Compare
|
Is it possible that MAX_INT is not platform-independent? |
a89888d to
2015b20
Compare
aeddd29 to
171d863
Compare
e9e94e8 to
ab03321
Compare
|
OK I think I fixed it |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the assumption that there's no reasonable way to write a regression test, this LGTM.
|
@ClemDoum you'll need to rebase on current master |
Codecov Report
@@ Coverage Diff @@
## master #13422 +/- ##
==========================================
- Coverage 96.65% 96.39% -0.27%
==========================================
Files 376 377 +1
Lines 69711 69869 +158
==========================================
- Hits 67381 67347 -34
- Misses 2330 2522 +192
Continue to review full report at Codecov.
|
|
Thank you @ClemDoum. Sorry I forgot to merge it when I approved. |
Side note: Since GitHub now has a nice squash-and-merge button, we now prefer merging master instead of rebasing. It is easier to follow the discussion. |
This reverts commit 7560d92.
This reverts commit 7560d92.
|
Does @ClemDoum or anyone else involved in this discussion remember the reason of I was not able to find any reference to this in the discussion ... |
|
@lesteve yes sorry I didn't mentioned it in the PR but if I remember well |
|
Right that makes sense, thanks a lot! |
This removes the -Woverflow warnings observed
when building scikit-learn.
RAND_R_MAX is the max value for uint8, incrementing
it causes an overflow (hence the warning).
I think this commit fixes the implementation, yet
I comes with a backwards incompatible results and
tests for implementation relying on `our_rand_r`
fails because results are now different.
I see several alternatives to remove the warning while
having tests pass
- prefered solution: adapt the test suite using the
new results so that all tests pass and ackowledge
the change of behavior for impacted user-facing APIs
in the changelog
- accept the quirk of this implementation but hardcode
and rename the effective constant
- silent the -Woverflow warning by another mean
Relates to:
scikit-learn#13422
scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originaly mentionned in scikit-learn#13422 (comment) but left unreviewed, it seems. I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - prefered solution: adapt the test suite using the new results so that all tests pass and ackowledge the change of behavior for impacted user-facing APIs in the changelog - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originally mentioned but seem to have been left unreviewed, see: scikit-learn#13422 (comment) I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - prefered solution: adapt the test suite using the new results so that all tests pass and ackowledge the change of behavior for impacted user-facing APIs in the changelog - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
This removes the -Woverflow warnings observed when building scikit-learn. RAND_R_MAX is the max value for uint8, incrementing it causes an overflow (hence the warning). Elements were originally mentioned but seem to have been left unreviewed, see: scikit-learn#13422 (comment) I think this commit fixes the implementation, yet I comes with a backwards incompatible results and tests for implementation relying on `our_rand_r` fails because results are now different. I see several alternatives to remove the warning while having tests pass - preferred solution: adapt the test suite using the new results so that all tests pass and acknowledge the change of behavior for impacted user-facing APIs in the change-log - accept the quirk of this implementation but hardcode and rename the effective constant - silent the -Woverflow warning by another mean Relates to: scikit-learn#13422 scikit-learn#24895
What does this fix:
While trying to make my intent classifier training deterministic in my NLU lib I noticed that I couldn't because the
BaseSGDClassifier._fit_multiclassmethod is non deterministic.To reproduce you have to initialize the
SGDClassifierwith aRandomStateinstance:I added a small print statement just before the joblib threads pass their seed to the
plain_sgdoraverage_sgdfunction. This gives me the following output (and will give you a different output on your machine but will eventually fail):What is happening is that the joblib threads share the
BaseSGDClassifier.random_stateand set a seed in thefit_binaryfunction before it's passed to theplain_sgdoraverage_sgdfunction. Depending on the order on which the threads reach the seed setting, the output of the SGD can differ.The other reason is that the estimator random state was not passed in the
make_datasetfunction.I think the bug was not noticeable in the unit tests because the SGD estimators were initialized with
intrandom states. In this case, the input of thecheck_random_statefunction in thefit_binaryfunction is the integer seed, and each thread is actually returning the exact same random state and then sample the exact same random seed for the SGD.How does this fixes the bug:
seedargument to thefit_binaryfunction which default toNone. If the seed is notNonethen it will be used otherwise the seed is set with the estimatorrandom_state. This allows to set the jobs seeds before the jobs are distributed to the threads and avoids the non-deterministic behaviormake_datasetfunctionHelp needed
(Now also fixes #5015.)
Comment
While fixing the initial but another bug was found and fixed: the
our_rand_rin the Cython code of thesklearn/utils/seq_dataset.pyxwas not behaving consistently across platforms, see the full bug description and fix below