-
Notifications
You must be signed in to change notification settings - Fork 1k
Add an illustrative example to ?GForce when sorting locale matters #5342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5342 +/- ##
=======================================
Coverage 97.48% 97.48%
=======================================
Files 80 80
Lines 14862 14862
=======================================
Hits 14488 14488
Misses 374 374 ☔ View full report in Codecov by Sentry. |
NEWS.md
Outdated
| 15. Thanks to @ssh352, Václav Tlapák, Cole Miller, András Svraka and Toby Dylan Hocking for reporting and bisecting a significant performance regression in dev. This was fixed before release thanks to a PR by Jan Gorecki, [#5463](https://github.com/Rdatatable/data.table/pull/5463). | ||
| 17. Some clarity is added to `?GForce` for the case when subtle changes to `j` produce different results because of differences in locale. Because `data.table` _always_ uses the "C" locale, small changes to queries which activate/deactivate GForce might cause confusingly different results when sorting is involved, [#5331](https://github.com/Rdatatable/data.table/issues/5331). The inspirational example compared `DT[, .(max(a), max(b)), by=grp]` and `DT[, .(max(a), max(tolower(b))), by=grp]` -- in the latter case, GForce is deactivated owing to the _ad-hoc_ column, so the result for `max(a)` might differ for the two queries. An example is added to `?GForce`. As always, there are several options to guarantee consistency, for example (1) use namespace qualification to deactivate GForce: `DT[, .(base::max(a), base::max(b)), by=grp]` or (2) turn off GForce with `options(datatable.optimize = 0)`. Thanks @markseeto for the example and @michaelchirico for the improved documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great news item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option worth to mention is setting up locale to C
* Fix 5492 by limiting the costly deparse to `nlines=1` * Implementing PR feedbacks * Added inside * Fix typo in name * Idiomatic use of inside * Separating the deparse line limit to a different PR --------- Co-authored-by: Michael Chirico <[email protected]>
* Added my improvements to the intro vignette * Removed two lines I added extra as a mistake earlier * Requested changes
* fix typos and grammatical mistakes * fix typos and punctuation * remove double spaces where it wasn't necessary * fix typos and adhere to British English spelling * fix typos * fix typos * add missing closing bracket * fix typos * review fixes * Update vignettes/datatable-benchmarking.Rmd Co-authored-by: Michael Chirico <[email protected]> * Update vignettes/datatable-benchmarking.Rmd Co-authored-by: Michael Chirico <[email protected]> * Apply suggestions from code review benchmarking Co-authored-by: Michael Chirico <[email protected]> * remove unnecessary [ ] from datatable-keys-fast-subset.Rmd * Update vignettes/datatable-programming.Rmd Co-authored-by: Michael Chirico <[email protected]> * Update vignettes/datatable-reshape.Rmd Co-authored-by: Michael Chirico <[email protected]> * One last batch of fine-tuning --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Michael Chirico <[email protected]>
* Updated documentation for rbindlist(fill=TRUE) * Print NULL entries of list as NULL * Added news item * edit NEWS, use '[NULL]' not 'NULL' * fix test * split NEWS item * add example --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Benjamin Schwendinger <[email protected]>
* clarify that list input->unnamed list output * Add example where make.names is used * mention role of make.names
|
@TysonStanley are we in the thick of the release process already? Wondering if maybe this doc-only change should be shipped with 1.15.0. It's not urgent though, so if you've already started packaging up 1.15.0 for CRAN, no need to bother -- we'll just merge to 1-15-99. |
|
Was waiting on a fix from @ben-schwen and @tdhock added the change in the description too so this would be just fine. |
Is that different from #5887? That was merged / results in revdep show nothing new. Sorry but not sure I follow the second half of your comment. |
|
Great, then I would say to include this and then tomorrow I'll restart the release process. As for the second part, Toby changed ctr to aut for you, Toby, and Jan as well but I had seen that was merged, just missed the one from Ben. |
|
OK, rebased to |
NB: Ben is not currently a committer: https://github.com/orgs/Rdatatable/teams/maintainers |
Right, I meant Ben's work on #5887 . Only you, Jan, and Toby are committers. |
ben-schwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GForce is already turned off when options(datatable.optimize=1L). Might want to adjust that or say that optimize=0 turns optimizations off
Closes #5331