-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG+1] Read-only input data in common tests #4807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Read-only input data in common tests #4807
Conversation
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that wasn't you, but why is this commented? that seems... odd.. whoops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, it is fine. just remove the line please.
a6894e4 to
b492b44
Compare
|
Thanks to the range of We observe no failure using files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that on purpose? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the noise
|
👍 for merge on my side. I just want you to add the comment on the copy aspect. Thanks, this is super useful. It may seem to be a detail, but it actually is an important step for scalability. |
|
Oops, other TODO: the renaming 'Y' to lowercase :). |
2e4776c to
31269f8
Compare
|
I suspect a glitch in CI... (same problem as nilearn/nilearn#623) |
|
Restarted CI => glitch fixed. |
sklearn/utils/validation.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead we should not do:
1- always use np.asarray here.
2- set array_orig = array at the beginning of the function
3- then at the end of this function, just before returning:
if copy and array is array_orig:
array = array.copy()|
Change in |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blobs are shuffled, right?
|
|
e057735 to
a664269
Compare
610c6a0 to
80ec85a
Compare
|
My bad. check_array replaced memory map by array , which is an unwanted behavior that prevent estimator to run out-of-core. This is now fixed with regression test provided. I use np.asarray which transform memory map into arrays but use the provided memory map as the To sum up these estimators fail on common tests as they have side effects on input
Transformers :
SkewedChi2Sampler fails on assertion, for a non trivial reason. I am filling an issue which should be tagged as Easy. |
sklearn/utils/validation.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not trivial, I think it might benefits from a bit of explanation.
sklearn/utils/validation.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but if array is not array_orig but array.base is array_orig and copy = true, shouldn't we create a real copy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good remark. I think this should be:
if copy and np.may_share_memory(array, array_orig):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think that might fix the transformer part for the #5481
|
Should this be labelled "need contributor", @arthurmensch? |
|
What needs to be done here? |
|
Do we consider this to be blocked by #5481? |
|
Could we wrap this up by bypassing the test on estimators where we expect it to fail, and make it an issue to fix each? @arthurmensch, are you willing to finish it off, or should we find someone else? |
|
FYI I have a (for now) WIP attempt of reviving this PR at #10663. |
Following PR #4775, I added checks in estimator checks in order to verify
Estimatorbehavior on read only mem-mapped data.A few issues there :
sklearn/linear_model/cd_fast.pyxstill raise errors on some use cases. This is related to the fact that we cannot make Lasso fails using simple read-only memmap as input.