Skip to content

Conversation

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Sep 4, 2019

Follow up to #3604
Fixes revdeps neatRanges and rbi.helpers, #3581

forder(x,...) changed to forder(...) so can be a drop-in replacement for base::order in the cases where base::order returns a useful result. This makes the optimization to forder easier and works now when applied by do.call as neatRanges and rbi.helpers do.
Minimized the manipulation in the body of forder(). It now uses an eval approach instead of each column using point().

@mattdowle mattdowle added this to the 1.12.4 milestone Sep 4, 2019
@mattdowle mattdowle mentioned this pull request Sep 4, 2019
31 tasks
@mattdowle mattdowle changed the title Order forder order -> forder optimization follow up Sep 4, 2019
@codecov
Copy link

codecov bot commented Sep 4, 2019

Codecov Report

Merging #3817 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3817      +/-   ##
==========================================
- Coverage   99.41%   99.41%   -0.01%     
==========================================
  Files          71       71              
  Lines       13265    13231      -34     
==========================================
- Hits        13188    13154      -34     
  Misses         77       77
Impacted Files Coverage Δ
src/init.c 100% <ø> (ø) ⬆️
src/assign.c 100% <ø> (ø) ⬆️
R/setkey.R 100% <100%> (ø) ⬆️
R/data.table.R 100% <100%> (ø) ⬆️
src/forder.c 100% <100%> (ø) ⬆️
R/test.data.table.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d0dc92...085083f. Read the comment docs.

x = eval(sub[[2L]], parent.frame(), parent.frame())
if (is.list(x)) {
if (length(x)==0L && is.data.frame(x)) stop("Attempting to order a 0-column data.table or data.frame.")
sub[2L] = NULL # change list(DT, ...) to list(...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately it make sense to redirect users to use forderv directly when they want to pass DT into forder. If you agree, then maybe a verbose print for now informing about that? because any change here would be significant (and unnecessary) breaking change. Better to provide a verbose (or eventually non-verbose) print to user, so no one will be harmed.

Copy link
Member Author

@mattdowle mattdowle Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow. There is no breaking change in this respect here so far, afaik. This line that changes (list(DT,...) to list(...)) is basically just maintaining the old usage of forder(DT, -colA, colB) so that form still works. That form isn't what forderv is for, iiuc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, forder has different API but IMO it worth to make distinction, even if not a breaking change, then just suggests to users. So forder is a order replacement and forderv is meant to be used by passing data.table to it.

Copy link
Member Author

@mattdowle mattdowle Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can suggest forderv in verbose mode I guess. But although, yes, forderv is meant for passing DT to it, a second aspect of forderv is that it doesn't accept expressions of columns. I was hoping that users can just use DT[order(...), ] as they have been and not need to know that forder and forderv exist. The optimization should hide these things from them and the news item (15) is consistent with that (e.g. it doesn't mention forder vs forderv). If verbose mode for DT[order(...), ] starts to suggest they use forderv it might be a bit confusing because i) they didn't know they were using forder, and ii) neither forder or forderv are exported or documented. Hopefully we never need to export them since optimization can take care of it.

Copy link
Member

@jangorecki jangorecki Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is DT[order(...), ] optimised to forder(DT, ...)? if not, then the verbose message wouldn't be printed. But agree that would be confusing till forder is not exported. Ultimately we want to export it #3447 "so it can be legally used outside of [.data.table"

Copy link
Member Author

@mattdowle mattdowle Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, so verbose message would just be when first argument is DT. DT[order(...),] is optimized to DT[forder(...),] so wouldn't print even in verbose mode. I added link to #3447 back to this discussion then so we can come back to it.

@jangorecki jangorecki removed the WIP label Sep 4, 2019
}
if (!length(DT))
error("DT is an empty list() of 0 columns");
error("Internal error: DT is an empty list() of 0 columns"); // # nocov should have been caught be colnamesInt, test 2099.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to push down call to colnamesInt from R to C? more related logic next to each other, and (probably very tiny) speed improvement.

Copy link
Member Author

@mattdowle mattdowle Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colnamesInt is already a .Call to C. You added it quite recently iirc and it's nice. There are 11 calls currently from R level which seems right and nice to me, iiuc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to call it from forder.c not from forder R

Copy link
Member Author

@mattdowle mattdowle Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do I suppose. I've just made forderv even smaller at R level, so forderv could be a single .Call() eventually. Moving colnamesInt call down to C for that reason might be nice. I feel like I've spent way too long on this already so let's pencil that in for future, if that's ok. Link back here added to #3447.

@mattdowle mattdowle mentioned this pull request Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants