-
-
Notifications
You must be signed in to change notification settings - Fork 211
Adding documentation for _convert_array_format output parameter #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an explicit list of possible conversions? If I would call the function now trying to convert a sparse dataframe to a dense numpy array, it would return the sparse dataframe without any warning/error.
This might be okay for an internal function (I personally would rather see an error), but it definitely should be documented.
openml/datasets/dataset.py
Outdated
| Parameters | ||
| ---------- | ||
| array_format : str | ||
| Tag to attach to the dataset to get a pandas SparseDataFrame or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean with "tag to attach to the dataset". From my understanding it would rather read something like "Datatype to convert the data to".
Please also add the list of legal values ('array' and 'dataframe').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was the existing docstring and I was sceptical in removing it altogether so kind of cut-pasted what was there.
I'll make the changes.
openml/datasets/dataset.py
Outdated
| - If array_format='array' | ||
| Converts non-sparse numeric data to numpy-array | ||
| Enforces numeric encoding of categorical columns | ||
| Missing values are represented as NaN in the dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean ndarrray instead of dataframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, my bad.
By explicit list I believe you want to highlight the two if-conditions in the function, i.e.,
Am I understanding this right?
You seem to be describing the case when both the if-conditions are evaluated as False.
Assuming the above case is correct, I believe converting the second if to an |
Yes.
I think so. I'll leave it up to you if calling it with |
|
Have made the changes as I mentioned. Tried to be explicit by mentioning the same in the documentation. Hope the structure is correct. |
openml/datasets/dataset.py
Outdated
| elif array_format == "dataframe" and scipy.sparse.issparse(data): | ||
| return pd.SparseDataFrame(data, columns=attribute_names) | ||
| else: | ||
| warn("Conversion criteria not satisfied. Returning input data.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add something that mentions its arguments? Makes it easier to track this down when a user runs into this. E.g.: "Cannot convert <type(data)> to <array_format>. Returning input data."
|
Looks good :) Thanks! Just the one (new) minor comment and then it should be good. I do want to figure out why all the tests fail on this PR and the next one before merging though. But I can also look into that. |
For the For the Whereas the unit test has a variable @PGijsbers, how do you recommend I fix this? Should I just update the variable value from 100 to 90? Or something else needs to be fixed? After certain fixes, #701 also fails the test cases with the same error. |
|
Ah. I did not look into the actual error logs yet. Those are known issues that are being worked on. No need to do anything about that. |
|
Everything looks good. As these changes are not invasive I would like to hold off a merge until we fix the dev branch. I'll look at that sooner rather than later. |
Reference Issue
Addresses #639.
What does this PR implement/fix? Explain your changes.
Clarifies the type conversions involved with the choice of
array_formatparameter.Any other comments?
The docstrings are as per my understanding of the discussion on #548.
Kindly advice on further corrections.