-
Notifications
You must be signed in to change notification settings - Fork 1k
order -> forder optimization follow up #3817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3817 +/- ##
==========================================
- Coverage 99.41% 99.41% -0.01%
==========================================
Files 71 71
Lines 13265 13231 -34
==========================================
- Hits 13188 13154 -34
Misses 77 77
Continue to review full report at Codecov.
|
| x = eval(sub[[2L]], parent.frame(), parent.frame()) | ||
| if (is.list(x)) { | ||
| if (length(x)==0L && is.data.frame(x)) stop("Attempting to order a 0-column data.table or data.frame.") | ||
| sub[2L] = NULL # change list(DT, ...) to list(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately it make sense to redirect users to use forderv directly when they want to pass DT into forder. If you agree, then maybe a verbose print for now informing about that? because any change here would be significant (and unnecessary) breaking change. Better to provide a verbose (or eventually non-verbose) print to user, so no one will be harmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow. There is no breaking change in this respect here so far, afaik. This line that changes (list(DT,...) to list(...)) is basically just maintaining the old usage of forder(DT, -colA, colB) so that form still works. That form isn't what forderv is for, iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, forder has different API but IMO it worth to make distinction, even if not a breaking change, then just suggests to users. So forder is a order replacement and forderv is meant to be used by passing data.table to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can suggest forderv in verbose mode I guess. But although, yes, forderv is meant for passing DT to it, a second aspect of forderv is that it doesn't accept expressions of columns. I was hoping that users can just use DT[order(...), ] as they have been and not need to know that forder and forderv exist. The optimization should hide these things from them and the news item (15) is consistent with that (e.g. it doesn't mention forder vs forderv). If verbose mode for DT[order(...), ] starts to suggest they use forderv it might be a bit confusing because i) they didn't know they were using forder, and ii) neither forder or forderv are exported or documented. Hopefully we never need to export them since optimization can take care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is DT[order(...), ] optimised to forder(DT, ...)? if not, then the verbose message wouldn't be printed. But agree that would be confusing till forder is not exported. Ultimately we want to export it #3447 "so it can be legally used outside of [.data.table"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, so verbose message would just be when first argument is DT. DT[order(...),] is optimized to DT[forder(...),] so wouldn't print even in verbose mode. I added link to #3447 back to this discussion then so we can come back to it.
| } | ||
| if (!length(DT)) | ||
| error("DT is an empty list() of 0 columns"); | ||
| error("Internal error: DT is an empty list() of 0 columns"); // # nocov should have been caught be colnamesInt, test 2099.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to push down call to colnamesInt from R to C? more related logic next to each other, and (probably very tiny) speed improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colnamesInt is already a .Call to C. You added it quite recently iirc and it's nice. There are 11 calls currently from R level which seems right and nice to me, iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean to call it from forder.c not from forder R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do I suppose. I've just made forderv even smaller at R level, so forderv could be a single .Call() eventually. Moving colnamesInt call down to C for that reason might be nice. I feel like I've spent way too long on this already so let's pencil that in for future, if that's ok. Link back here added to #3447.
Follow up to #3604
Fixes revdeps
neatRangesandrbi.helpers, #3581forder(x,...)changed toforder(...)so can be a drop-in replacement forbase::orderin the cases wherebase::orderreturns a useful result. This makes the optimization to forder easier and works now when applied bydo.callasneatRangesandrbi.helpersdo.Minimized the manipulation in the body of
forder(). It now uses anevalapproach instead of each column usingpoint().