Skip to content

Table fails to read big csv file #5302

@yannick1974

Description

@yannick1974

Hi,

The script attached exhibits a problem with Table reading a CSV files of 500,000 rows and 500 columns. It creates a big table, save it to CSV, check the number of lines of the file and read it again. Reading the file create a table with only 54,230 rows.

I tried to investigate and went till line 379 of cparser.pyx where self.tokenizer.num_rows is 54230. Next, it's C and I'm not fluent enough in C to find the problem.

Yannick

Here is the code as apparently one can't attach a python file to an issue:

import numpy as np
from astropy.table import Column, Table

NB_ROWS = 500000

print("Creating a {} rows table (500 columns).".format(NB_ROWS))
t1 = Table()
for i in range(500):
    t1.add_column(
        Column(
            name=str(i),
            data=np.random.random(NB_ROWS)
        )
    )

print("Saving the table to csv.")
t1.write("big_table.csv", format='ascii.csv')

print("Counting the number of lines in the csv, it should be {}"
      " + 1 (header).".format(NB_ROWS))
nb_lines = sum(1 for line in open("big_table.csv"))
print("The file has {} lines".format(nb_lines))

print("Reading the file with astropy.")
t2 = Table.read("big_table.csv", format='ascii.csv')
print("The table has {} rows.".format(len(t2)))

# Output of the script
# ====================
#
# Creating a 500000 rows table (500 columns).
# Saving the table to csv.
# Counting the number of lines in the csv, it should be 500000 + 1 (header).
# The file has 500001 lines
# Reading the file with astropy.
# The table has 54230 rows.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions