Skip to content

Conversation

@Neeratyoy
Copy link
Contributor

@Neeratyoy Neeratyoy commented Apr 15, 2021

Reference Issue

Addresses #1057.

What does this PR implement/fix? Explain your changes.

Adds a check when fetching and converting datasets that converts labels as sparse arrays to dataframes.

How should this PR be tested?

import openml
task = openml.tasks.get_task(12731)
_, y = task.get_X_and_y(dataset_format="dataframe")
print(type(y))  # print pandas Series

@Neeratyoy Neeratyoy marked this pull request as ready for review April 15, 2021 17:18
@Neeratyoy Neeratyoy requested a review from mfeurer April 15, 2021 17:19
@Neeratyoy Neeratyoy requested a review from mfeurer April 21, 2021 20:15
@Neeratyoy Neeratyoy requested a review from mfeurer April 22, 2021 16:42
if scipy.sparse.issparse(data):
return pd.DataFrame.sparse.from_spmatrix(data, columns=attribute_names)
data = pd.DataFrame.sparse.from_spmatrix(data, columns=attribute_names)
if isinstance(data, pd.DataFrame) and data.shape[1] == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Don't you also do this in get_data()? Also, I think it'll break the balloon datast which has only a single attribute.

@Neeratyoy Neeratyoy requested a review from mfeurer April 26, 2021 13:16
@mfeurer mfeurer merged commit 62014cd into develop Apr 26, 2021
@mfeurer mfeurer deleted the fix_1057 branch April 26, 2021 20:35
PGijsbers pushed a commit to Mirkazemi/openml-python that referenced this pull request Feb 23, 2023
* Convert sparse labels to pandas series

* Handling sparse labels as Series

* Handling sparse targets when dataset as arrays

* Revamping sparse dataset tests

* Removing redundant unit test

* Cleaning target column formatting

* Minor comment edit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants