-
-
Notifications
You must be signed in to change notification settings - Fork 116
CSV files with too many values in a row cause errors #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Steps to demonstrate that curl -o artsdatabanken.csv https://artsdatabanken.no/Fab2018/api/export/csv
sqlite-utils insert arts.db artsdatabanken artsdatabanken.csv --sniff --csv --encoding utf-16le |
I don't understand why that works but calling |
Fixing that
|
Here are full steps to replicate the bug: from urllib.request import urlopen
import sqlite_utils
db = sqlite_utils.Database(memory=True)
with urlopen("https://artsdatabanken.no/Fab2018/api/export/csv") as fab:
reader, other = sqlite_utils.utils.rows_from_file(fab, encoding="utf-16le")
db["fab2018"].insert_all(reader, pk="Id") |
Aha! I think I see what's happening here. Here's what >>> import csv, io
>>> list(csv.DictReader(io.StringIO("id,name\n1,Cleo,nohead\n2,Barry")))
[{'id': '1', 'name': 'Cleo', None: ['nohead']}, {'id': '2', 'name': 'Barry'}] See how that row with too many items gets this: That's a |
That weird behaviour is documented here: https://docs.python.org/3/library/csv.html#csv.DictReader
|
So I need to make a design decision here: what should Some options:
|
Whatever I decide, I can implement it in |
Here's the current function signature for sqlite-utils/sqlite_utils/utils.py Lines 174 to 179 in 26e6d26
|
Decision: I'm going to default to raising an exception if a row has too many values in it. You'll be able to pass |
The exception will be called |
Interesting challenge in writing tests for this: if you give
It decided the delimiter there was |
I think that's unavoidable: it looks like |
That broke
|
Getting this past
That's because of this line:
Which is legit here - we have a dictionary where one of the keys is |
Filed an issue against |
I'm going to rename |
I forgot to add equivalents of |
Original title: csv.DictReader can have None as key
In some cases,
csv.DictReader
can haveNone
as key for unnamed columns, and a list of values as value.sqlite_utils.utils.rows_from_file
cannot handle that:Result:
Code:
sqlite-utils/sqlite_utils/db.py
Line 3454 in 59be60c
sqlite-utils insert
from command line is not affected by this issue.The text was updated successfully, but these errors were encountered: