Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Remove baskets without features in _create_dataset.#461

Closed
ftesser wants to merge 1 commit intodeepset-ai:masterfrom
ftesser:issue_457_2
Closed

Remove baskets without features in _create_dataset.#461
ftesser wants to merge 1 commit intodeepset-ai:masterfrom
ftesser:issue_457_2

Conversation

@ftesser
Copy link
Copy Markdown
Contributor

@ftesser ftesser commented Jul 14, 2020

This is proposals 2 for #457.
I have added some additional info on the id that contains errors (so if someone want to check the data can find them).

The removal is done on _create_dataset.

logger.warning(f"Removing the following baskets because of errors in computing features:")
for basket in basket_to_remove:
logger.warning(f"Basket id: id_internal: {basket.id_internal}, id_external: {basket.id_external}")
self.baskets.remove(basket)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error could be thrown here if a basket contains multiple samples where sample.features is None

if sample.features is None:
basket_to_remove.append(basket)
else:
features_flat.extend(sample.features)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one set of sample.features is None, all sample.features from the same basket need to be thrown out otherwise we may have features which belong to a basket which has been removed. So we should probably first iterate over all samples in the basket to check if any have features = None. If there are, we remove all features. If not, then we can add all to features_flat

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @brandenchan for the suggestions. I am going to close this PR and create a new one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants