-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Description
Hi,
The script attached exhibits a problem with Table reading a CSV files of 500,000 rows and 500 columns. It creates a big table, save it to CSV, check the number of lines of the file and read it again. Reading the file create a table with only 54,230 rows.
I tried to investigate and went till line 379 of cparser.pyx where self.tokenizer.num_rows is 54230. Next, it's C and I'm not fluent enough in C to find the problem.
Yannick
Here is the code as apparently one can't attach a python file to an issue:
import numpy as np
from astropy.table import Column, Table
NB_ROWS = 500000
print("Creating a {} rows table (500 columns).".format(NB_ROWS))
t1 = Table()
for i in range(500):
t1.add_column(
Column(
name=str(i),
data=np.random.random(NB_ROWS)
)
)
print("Saving the table to csv.")
t1.write("big_table.csv", format='ascii.csv')
print("Counting the number of lines in the csv, it should be {}"
" + 1 (header).".format(NB_ROWS))
nb_lines = sum(1 for line in open("big_table.csv"))
print("The file has {} lines".format(nb_lines))
print("Reading the file with astropy.")
t2 = Table.read("big_table.csv", format='ascii.csv')
print("The table has {} rows.".format(len(t2)))
# Output of the script
# ====================
#
# Creating a 500000 rows table (500 columns).
# Saving the table to csv.
# Counting the number of lines in the csv, it should be 500000 + 1 (header).
# The file has 500001 lines
# Reading the file with astropy.
# The table has 54230 rows.