-
Notifications
You must be signed in to change notification settings - Fork 1k
Limit deparse in name_dots to 1 line #5501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5501 +/- ##
==========================================
- Coverage 99.51% 98.32% -1.19%
==========================================
Files 80 80
Lines 14774 14801 +27
==========================================
- Hits 14702 14553 -149
- Misses 72 248 +176
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
could you add a test where this matters? probably worth a NEWS item as well, this could potentially break some folks |
I'm not aware of any case where this change makes an observable difference in behavior. The best I can suggest is a test documenting the performance improvement in an awkward call of the sort t1 <- system.time(dt1 <- data.table(a=runif(1e5), b=runif(1e5)))
t2 <- system.time(dt2 <- do.call(data.table, list(dt1)))
test(2239.1, all.equal(dt1,dt2), TRUE)
test(2239.2, t2["elapsed"]/t1["elapsed"]<0.5, TRUE) # On my system this is >30 before the fix, <0.2 afterDo you think this adds value? |
|
It matters here: asodijfoaisjdfoiajsdofijasodifjoaisdjfoiajsdoifjaosidjfoiajsdofijaosidjfoiasjdofijasdoifjaosidjfoiasjdfiuajsfuasiudfhoiauhjzxclkvjbhlszkjxhcliuasdhfgoiuahsdoufhlijxchlkzjhvkjhaljshdfiluhcubvjdfbvjkahsdfliuhawpioedhflaijsdh = function() 1L
data.table(c(
asodijfoaisjdfoiajsdofijasodifjoaisdjfoiajsdoifjaosidjfoiajsdofijaosidjfoiasjdofijasdoifjaosidjfoiasjdfiuajsfuasiudfhoiauhjzxclkvjbhlszkjxhcliuasdhfgoiuahsdoufhlijxchlkzjhvkjhaljshdfiluhcubvjdfbvjkahsdfliuhawpioedhflaijsdh(),
asodijfoaisjdfoiajsdofijasodifjoaisdjfoiajsdoifjaosidjfoiajsdofijaosidjfoiasjdofijasdoifjaosidjfoiasjdfiuajsfuasiudfhoiauhjzxclkvjbhlszkjxhcliuasdhfgoiuahsdoufhlijxchlkzjhvkjhaljshdfiluhcubvjdfbvjkahsdfliuhawpioedhflaijsdh(),
asodijfoaisjdfoiajsdofijasodifjoaisdjfoiajsdoifjaosidjfoiajsdofijaosidjfoiasjdofijasdoifjaosidjfoiasjdfiuajsfuasiudfhoiauhjzxclkvjbhlszkjxhcliuasdhfgoiuahsdoufhlijxchlkzjhvkjhaljshdfiluhcubvjdfbvjkahsdfliuhawpioedhflaijsdh(),
asodijfoaisjdfoiajsdofijasodifjoaisdjfoiajsdoifjaosidjfoiajsdofijaosidjfoiasjdofijasdoifjaosidjfoiasjdfiuajsfuasiudfhoiauhjzxclkvjbhlszkjxhcliuasdhfgoiuahsdoufhlijxchlkzjhvkjhaljshdfiluhcubvjdfbvjkahsdfliuhawpioedhflaijsdh()
))There should be a more minimal version of this but hopefully illustrates the point |
We shy away from timing tests in our main suite -- you never know when some system interference / sun spots can cause things to go haywire and now we have to deal with the threat of CRAN removal... OTOH, there is this non-CRAN suite, couldn't hurt to throw a test like that in there: I'd suggest a threshold of maybe 2 or 5 to be conservative, too, .2 vs .5 seems to tight and likely not to replicate across platforms / with high statistical certainty |
Nice! |
|
@MichaelChirico I checked now and think your example also doesn't have an observable impact on behavior. With and without
I am adding the news item and benchmark test. |
mattdowle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find. Just on the news item, I try to imagine myself as user reading news, and this is the thought I would have.
| 53. `as.data.frame(DT, row.names=)` no longer silently ignores `row.names`, [#5319](https://github.com/Rdatatable/data.table/issues/5319). Thanks to @dereckdemezquita for the fix and PR, and @ben-schwen for guidance. | ||
|
|
||
| 54. data.table construction uses `deparse` to try and automatically name un-named cols. The attempt is now limited to 1- line which gives substantial speedup in [some edge cases](https://github.com/Rdatatable/data.table/pull/5501). Thanks @OfekShilon for the report and fix, @michaelchirico for guidance. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data.table construction
Does that mean a data.table(...) call with arguments passed to it? If so, could "data.table construction" be replaced with data.table(...) ?
The attempt is now limited to 1- line
Does that mean calls to data.table(...) split over several lines (which I guess is very common) will lose this feature in the 2nd line onwards? i.e.
data.table( A = 1:5,
mean(my_vec) # column name will be different now?
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refined the phrasing a bit (see the commit) - it is now both more accurate and less understandable.. I'd be thankful for any suggestion, and am totally fine with omitting the NEWS item altogether. AFAICT this change has no user-facing consequences, beyond some speedup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Ok I played with it a bit and tweaked the news item. It's not just 'some' speedup, but it scales with the data size. It's a tremendous finding certainly deserving of a news item and example!
…sible, and played with it preparing for the news item
…n imagine what might be similar, provide example with the timings and make the example more plausible by changing the input to be a data.frame
…w; this benchmark test file isn't in use yet and using 2239 here might make us think in future that these tests were moved from tests.Rraw which is happening currently but doesn't apply to these new ones
Per @MichaelChirico 's request, this improvement is separated to its own PR.