-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
Description
library(data.table)
txt <- '
foo foo2
1 0 false
2 1 NA
'
print(read.table(text = txt, header = TRUE))
print(fread(text = txt, header="auto"))
print(fread(text = txt, header = TRUE))read.table() handles the row names as expected.
fread(header="auto") gives a warning and creates an extra column. Not great but can be fixed with suppressWarnings() and removing the extra column.
fread(header=TRUE) should do the same but instead gives:
1 0 false
1: 2 1 <NA>which is completely wrong.
Verbose output:
fread(text=txt, header="auto", verbose=TRUE)
OpenMP version (_OPENMP) 201511
omp_get_num_procs() 4
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 2147483647
omp_get_max_threads() 4
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 2 threads with throttle==1024. See ?setDTthreads.
Input contains a \n or is ")". Taking this to be text input (not a filename)
[01] Check arguments
Using 2 threads (omp_get_max_threads()=4, nth=2)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
`input` argument is provided rather than a file name, interpreting as raw text to read
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 2 starting: << foo foo2>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=' ' with 2 lines of 3 fields using quote rule 0
Detected 3 columns on line 2. This line is either column names or first data row. Line starts as: <<1 0 false>>
Quote rule picked = 0
fill=false and the most number of columns found is 3
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 1 because (24 bytes from row 1 to eof) / (2 * 24 jump0size) == 0
Type codes (jump 000) : 55C Quote rule 0
Types in 1st data row match types in 2nd data row but previous row has 2 fields. Taking previous row as column names. All rows were sampled since file is small so we know nrow=1 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 55C
[10] Allocate memory for the datatable
Allocating 3 column slots (3 - 0 dropped) with 1 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=24
Too few rows allocated. Allocating additional 1024 rows (now nrows=1025) and continue reading from jump 0
jumps=[0..1), chunk_size=1048576, total_size=24
Read 2 rows x 3 columns from 37 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
Type counts:
2 : int32 '5'
1 : string 'C'
=============================
0.000s ( 18%) Memory map 0.000GB file
0.001s ( 61%) sep=' ' ncol=3 and header detection
0.000s ( 4%) Column type detection using 1 sample rows
0.000s ( 4%) Allocation of 1025 rows x 3 cols (0.000GB) of which 2 ( 0%) rows used
0.000s ( 13%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 2 rows) using 1 threads
+ 0.000s ( 1%) Parse to row-major thread buffers (grown 0 times)
+ 0.000s ( 0%) Transpose
+ 0.000s ( 12%) Waiting
0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
0.001s Total
V1 foo foo2
1: 1 0 false
2: 2 1 <NA>
Warning message:
In fread(text = txt, header = "auto", :
Detected 2 column names but the data has 3 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.fread(text=txt, header=TRUE, verbose=TRUE)
OpenMP version (_OPENMP) 201511
omp_get_num_procs() 4
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 2147483647
omp_get_max_threads() 4
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 2 threads with throttle==1024. See ?setDTthreads.
Input contains a \n or is ")". Taking this to be text input (not a filename)
[01] Check arguments
Using 2 threads (omp_get_max_threads()=4, nth=2)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
`input` argument is provided rather than a file name, interpreting as raw text to read
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 2 starting: << foo foo2>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=' ' with 2 lines of 3 fields using quote rule 0
Detected 3 columns on line 2. This line is either column names or first data row. Line starts as: <<1 0 false>>
Quote rule picked = 0
fill=false and the most number of columns found is 3
[07] Detect column types, good nrow estimate and whether first row is column names
'header' changed by user from 'auto' to true
Number of sampling jump points = 1 because (24 bytes from row 1 to eof) / (2 * 24 jump0size) == 0
Type codes (jump 000) : 552 Quote rule 0
All rows were sampled since file is small so we know nrow=1 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 552
[10] Allocate memory for the datatable
Allocating 3 column slots (3 - 0 dropped) with 1 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=12
Read 1 rows x 3 columns from 37 bytes file in 00:00.001 wall clock time
[12] Finalizing the datatable
Type counts:
1 : bool8 '2'
2 : int32 '5'
=============================
0.000s ( 24%) Memory map 0.000GB file
0.000s ( 53%) sep=' ' ncol=3 and header detection
0.000s ( 6%) Column type detection using 1 sample rows
0.000s ( 6%) Allocation of 1 rows x 3 cols (0.000GB) of which 1 (100%) rows used
0.000s ( 11%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 1 rows) using 1 threads
+ 0.000s ( 1%) Parse to row-major thread buffers (grown 0 times)
+ 0.000s ( 0%) Transpose
+ 0.000s ( 10%) Waiting
0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
0.001s Total
1 0 false
1: 2 1 NA