-
Notifications
You must be signed in to change notification settings - Fork 1k
fifelse followup #3741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fifelse followup #3741
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3741 +/- ##
==========================================
- Coverage 99.41% 99.41% -0.01%
==========================================
Files 71 71
Lines 13208 13196 -12
==========================================
- Hits 13131 13119 -12
Misses 77 77
Continue to review full report at Codecov.
|
| test(2072.047, fifelse(FALSE, data.table(1:5), data.table(5:1)), data.table(5:1)) | ||
| test(2072.048, fifelse(TRUE, data.frame(1:5), data.frame(5:1)), data.frame(1:5)) | ||
| test(2072.049, fifelse(FALSE, data.frame(1:5), data.frame(5:1)), data.frame(5:1)) | ||
| test(2072.046, fifelse(TRUE, list(data.table(1:5)), list(data.table(5:1))), list(data.table(1:5))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is improper use. data.frame should be handled as ordinary list here, thus wrapping into list to achieve equality. Consider
fifelse(TRUE, data.table(1:5, 2:6), data.table(5:1, 6:2))| if (len1!=1 && len1!=len0) error("Length of 'yes' is %lld but must be 1 or length of 'test' (%lld).", len1, len0); | ||
| if (len2!=1 && len2!=len0) error("Length of 'no' is %lld but must be 1 or length of 'test' (%lld).", len2, len0); | ||
| const int64_t amask = len1>1 ? INT64_MAX : 0; | ||
| const int64_t amask = len1>1 ? INT64_MAX : 0; // for scalar 'a' bitwise AND will reset iterator to first element: pa[i & amask] -> pa[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see now. helpful comment indeed!
and the thinking is the compiler will optimize away all the repeated & calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitwise AND of any_number & 0 will result 0, while of any_number & max_number will result any_number, due to the fact that max_number is represented by all 1 numbers, i.e. for 4 bit unsigned integer: 1 & 15 -> 0001 & 1111 results 0001.
I have no idea how compiler approach that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the & is so fast that it doesn't matter. It is so fast that it can probably be done within the CPU's pipeline 'for free' without affecting measurable timings outside the pipeline.
The compiler can't optimize it away because it doesn't know what will be in mask. The mask is declared as const so that the compiler can registerize its use though; i.e. it doesn't need to keep checking whether amask has changed within the loop.
|
Thank you Jan, I will have a look at the changes. What about coercing logical, complex and string (like ifelse) instead of just int to double? I can look at it and suggest something if you want? |
|
@2005m I prefer to have strict type checks and require user to provide correct ones. But... I think int-double is good exception here, because it is so widely used, and still make sense, while coercing numeric to character is rather useless. ifelse(c(T,F), 1L, list(0))
ifelse(c(T,T), 1L, list(0)) |
closes #3740
important changes to discuss (fyi @2005m):
I am not against this feature, but it seems to be not straightforward to address issue related to it in a consistent way. I prefer
hutils/dplyrbehaviour to raise error whenyesis a list andnois an atomic. As a result, user will need to wrapnointo extra list to match data types.This shouldn't be big deal. I think it make sense when we want to use it on a
listcolumn.testargument, so names of lists provided toyes/noare not retained anymore.I don't see a reason why we would retain names of a list in
yes/nobut not for named atomic vectors. Make sense to stick to base way, so retaining names oftest. If we want to really handle also names fromyesandnothen make sense to do it for any named input, not just list.