ARROW-2689: [Python] Remove parameter timestamps_to_ms#2129
ARROW-2689: [Python] Remove parameter timestamps_to_ms#2129xhochy wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2129 +/- ##
==========================================
- Coverage 86.39% 86.37% -0.03%
==========================================
Files 242 230 -12
Lines 41481 40589 -892
==========================================
- Hits 35838 35059 -779
+ Misses 5643 5530 -113Continue to review full report at Codecov.
|
|
So we should run https://arrow.apache.org/docs/python/generated/pyarrow.Column.html#pyarrow.Column.cast after converting a data frame to arrow. 👍 |
|
@domoritz could you elaborate on your use case a bit more? |
|
I'm trying to convert some data from pandas to arrow but pandas' timestamps are in ns. I want to reduce the data size and use lower precision. My code looks roughly like this: df = pd.read_csv('flights.csv', encoding='utf-8', dtype={'FL_DATE': 'str', 'ARR_TIME': 'str', 'DEP_TIME': 'str'})
arr_time = df.FL_DATE + df.ARR_TIME.replace('2400', '0000')
data['ARRIVAL'] = pd.to_datetime(arr_time, format='%Y%m%d%H%M')
dep_time = df.FL_DATE + df.DEP_TIME.replace('2400', '0000')
data['DEPARTURE'] = pd.to_datetime(dep_time, format='%Y%m%d%H%M')
df = df.astype({'DEP_DELAY': 'int16', 'ARR_DELAY': 'int16', 'AIR_TIME': 'int16', 'DISTANCE': 'int16'})
table = pa.Table.from_pandas(df)
table.column('ARRIVAL').cast(pa.TimestampValue, True)
writer = pa.RecordBatchFileWriter(f'{name}.arrow', table.schema)
writer.write(table)
writer.close() |
|
Okay. In this line:
Are you trying to cast that column a different timestamp unit? This line of code leaves It would be a good idea to add a documentation section about type casting and how to change the column type of a table; I don't think we have that right now. We could also add some convenience APIs to help with common workflows (e.g. replacing a single column) |
Yes, I am trying to switch to ns to ms accuracy. I guess I have to write something like |
|
Okay, let's create a JIRA about this and discuss there. Firstly, the statement or depending on whether you want to allow unsafe casts (see http://arrow.apache.org/docs/python/generated/pyarrow.lib.Array.html#pyarrow.lib.Array.cast). I think the docstring could be improved to make more clear that a DataType instance is expected rather than a class object. Secondly, we don't have a convenient function for replacing a column in a table to create a new table. So I would want to write: |
|
Thank you @wesm! I hope my comments are helpful. |
This parameter is no longer existent. For the Parquet path it was replaced by
coerce_timestamps, other cases should useColumn.cast().