Skip to content

Hitting _csv.Error: field larger than field limit (131072) #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frosencrantz opened this issue Feb 13, 2021 · 3 comments
Closed

Hitting _csv.Error: field larger than field limit (131072) #229

frosencrantz opened this issue Feb 13, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@frosencrantz
Copy link

I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:
_csv.Error: field larger than field limit (131072)

The stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633

There is a way to handle this that helps:
https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

One issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.
There is the progress bar, but that is by percent rather than by line number. It would have been helpful if it could have provided a line number.

Also, it would have been useful if it had allowed the loading to continue with later lines.

@simonw simonw added the bug Something isn't working label Feb 14, 2021
@simonw
Copy link
Owner

simonw commented Feb 14, 2021

Same issue as #227.

@simonw
Copy link
Owner

simonw commented Feb 14, 2021

I want to set this to the maximum allowed limit, which seems to be surprisingly hard! That StackOverflow thread is full of ideas for that, many of them involving ctypes. I'm a bit loathe to add a dependency on ctypes though - even though it's in the Python standard library I worry that it might not be available on some architectures.

@simonw
Copy link
Owner

simonw commented Feb 14, 2021

I'm going to use this pattern from https://stackoverflow.com/a/15063941

import sys
import csv
maxInt = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.

    try:
        csv.field_size_limit(maxInt)
        break
    except OverflowError:
        maxInt = int(maxInt/10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants