-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Ensure map_partitions returns Series object if function returns scalar
#10739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
map_partitions returns Series object if function returns scalar
phofl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
|
Looks like there are a bunch of genuine typing issues since I set the dtype on the created series explicitly. @phofl if you want to, you can pick this up after the holidays but if not I'll handle it once I'm back |
|
I'll take a look after the holidays, the dtype issue is most likely because of running this on an empty meta object, that's the situation where I've seen us ending up with NaN there |
fjetter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl in case you want to have another look
| df = func(*args, **kwargs) | ||
|
|
||
| if is_scalar(df) and is_series_like(meta): | ||
| df = type(meta)([df]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing the explicit dtype since it caused issues and was quite unnecessary considering that df is already the final result data. The casting/coercing logic of the Series takes care of compat stuff with nulls, etc.
| token=name1, | ||
| meta=self, | ||
| **chunk_kwargs, | ||
| enforce_metadata=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using map_partitions internally we sometimes rely on the data not being cast to a Series. The cumulative aggregation logic becomes quite cumbersome otherwise. I hope this is not introducing any other weird artefacts 🤞
b21d4f2 to
71c29e0
Compare
phofl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic looks fine, I think the test failures are something that we can fix by adjusting the test
7d4fbd7 to
fc5cd91
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 35m 57s ⏱️ + 13m 1s For more details on these failures and errors, see this check. Results for commit c60269b. ± Comparison against base commit 1ecf464. ♻️ This comment has been updated with latest results. |
fc5cd91 to
a7f08aa
Compare
2e95b60 to
c60269b
Compare
pre-commit run --all-files