Add support for reading CSV files with comments#10467
Conversation
b1092d8 to
62b8364
Compare
|
This is currently a sketch for a possible implementation for #10262. The approach taken push interpretation of comment lines into If this is a viable solution it would require a bump of at least the @alamb, would you be open to shepherding this PR and apache/arrow-rs#5759, or alternatively could help identify someone who could? |
fb58860 to
1df527d
Compare
Yes. FWIW DataFusion typically upgrades to the latest arrow-rs (including arrow-csv) dependency so while extra time would be needed no extra work would be |
d4faa11 to
f27f2dc
Compare
|
This is now rebased on |
alamb
left a comment
There was a problem hiding this comment.
Thank you very much for this contribution @bbannier -- this code looks great. The only thing I think this PR now needs is some test coverage so we don't break it in the future
Here is my suggestion for testing:
- update csv_files.slt, see
this filefor info on running sql logic tests
Note I think you can programatically create a csv file with a command like
> copy (values ('column1,column2'), ('#second line is a comment'), ('2,3')) TO '/tmp/my.csv' OPTIONS ('format.delimiter' '|');
+-------+
| count |
+-------+
| 3 |
+-------+
1 row(s) fetched.
Elapsed 0.004 seconds.That results in
$ cat /tmp/my.csv
column1,column2
#second line is a comment
2,3This patch adds support for parsing CSV files containing comment lines. Closes apache#10262.
| 'format.delimiter' ','); | ||
|
|
||
| query TT | ||
| SELECT * from stored_table_with_comments; |
This patch adds support for parsing CSV files containing comment lines. Closes apache#10262.
This PR adds support for parsing CSV files containing comment lines.
Closes #10262.