Skip to content

Commit 5affced

Browse files
authored
Fix memory leak in fread (#4710)
1 parent dd2134e commit 5affced

File tree

6 files changed

+34
-4
lines changed

6 files changed

+34
-4
lines changed

NEWS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -560,7 +560,9 @@
560560
DT2 = do.call(data.table, list(DF)) # 3.07s before, 0.02s after
561561
identical(DT1, DT2) # TRUE
562562
```
563-
563+
564+
55. `fread()` could leak memory, [#3292](https://github.com/Rdatatable/data.table/issues/3292). Thanks to @patrickhowerter for reporting, and Jim Hester for the fix. The fix requires R 3.4.0 or later. Loading `data.table` in earlier versions now warns that known problems exist, asks users to upgrade R, and warns that we intend to upgrade `data.table`'s dependency from 8-year-old R 3.1.0 (April 2014) to 5-year-old R 3.4.0 (April 2017).
565+
564566
## NOTES
565567

566568
1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :

R/onAttach.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@
3535
else
3636
packageStartupMessagef("This is %s. This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue.\n**********", Sys.info()["sysname"])
3737
}
38+
if (.Call(CbeforeR340)) {
39+
# not base::getRversion()<"3.4.0" in case the user upgrades R but does not reinstall data.table; a reasonable mistake since data.table would seem to be the latest version
40+
packageStartupMessagef("**********\nThis data.table installation was compiled for R < 3.4.0 (Apr 2017) and is known to leak memory. Please upgrade R and reinstall data.table to fix the leak. Maintaining and testing code branches to support very old versions increases development time so please do upgrade R. We intend to bump data.table's dependency from 8 year old R 3.1.0 (Apr 2014) to 5 year old R 3.4.0 (Apr 2017).\n**********")
41+
}
3842
}
3943
}
4044

inst/tests/benchmark.Rraw

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,3 +175,11 @@ t2 = system.time(DT2 <- do.call(data.table, list(DF))) # 3.07s before, 0.02s af
175175
test(, identical(DT1, DT2))
176176
test(, t2["elapsed"]/t1["elapsed"]<2)
177177

178+
# fread leak, #3292
179+
dummy = rep("1\t2\t3\t4\t5", 10000000)
180+
writeLines(dummy, "out.tsv")
181+
start = gc()["Vcells",2]
182+
for (i in 1:10) data.table::fread("out.tsv")
183+
end = gc()["Vcells",2]
184+
test(, end/start < 1.05)
185+

src/data.table.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,7 @@ SEXP nqRecreateIndices(SEXP, SEXP, SEXP, SEXP, SEXP);
323323
SEXP fsort(SEXP, SEXP);
324324
SEXP inrange(SEXP, SEXP, SEXP, SEXP);
325325
SEXP hasOpenMP(void);
326+
SEXP beforeR340(void);
326327
SEXP uniqueNlogical(SEXP, SEXP);
327328
SEXP dllVersion(void);
328329
SEXP initLastUpdated(SEXP);

src/freadR.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -523,9 +523,13 @@ void setFinalNrow(size_t nrow) {
523523
if (length(DT)) {
524524
if (nrow == dtnrows)
525525
return;
526-
for (int i=0; i<LENGTH(DT); i++) {
527-
SETLENGTH(VECTOR_ELT(DT,i), nrow); // TODO: realloc
528-
SET_TRUELENGTH(VECTOR_ELT(DT,i), nrow);
526+
const int ncol=LENGTH(DT);
527+
for (int i=0; i<ncol; i++) {
528+
SETLENGTH(VECTOR_ELT(DT,i), nrow);
529+
SET_TRUELENGTH(VECTOR_ELT(DT,i), dtnrows);
530+
#if defined(R_VERSION) && R_VERSION>=R_Version(3,4,0)
531+
SET_GROWABLE_BIT(VECTOR_ELT(DT,i)); // #3292
532+
#endif
529533
}
530534
}
531535
R_FlushConsole(); // # 2481. Just a convenient place; nothing per se to do with setFinalNrow()

src/init.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ R_CallMethodDef callMethods[] = {
116116
{"Cinrange", (DL_FUNC) &inrange, -1},
117117
{"Cbetween", (DL_FUNC) &between, -1},
118118
{"ChasOpenMP", (DL_FUNC) &hasOpenMP, -1},
119+
{"CbeforeR340", (DL_FUNC) &beforeR340, -1},
119120
{"CuniqueNlogical", (DL_FUNC) &uniqueNlogical, -1},
120121
{"CfrollfunR", (DL_FUNC) &frollfunR, -1},
121122
{"CdllVersion", (DL_FUNC) &dllVersion, -1},
@@ -330,6 +331,16 @@ SEXP hasOpenMP(void) {
330331
}
331332
// # nocov end
332333

334+
SEXP beforeR340(void) {
335+
// used in onAttach.R for message about fread memory leak fix needing R 3.4.0
336+
// at C level to catch if user upgrades R but does not reinstall data.table
337+
#if defined(R_VERSION) && R_VERSION<R_Version(3,4,0)
338+
return ScalarLogical(true);
339+
#else
340+
return ScalarLogical(false);
341+
#endif
342+
}
343+
333344
extern int *_Last_updated; // assign.c
334345

335346
SEXP initLastUpdated(SEXP var) {

0 commit comments

Comments
 (0)