-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Could you add a parameter to the Table.insert_data method that accepts a partition label?
I'm trying to write a Python app that is able to copy data from one partition in a table to another partition. I have a data field timestamp in my Bigquery schema and I want to query off of this value. However the partition does not always align with values in this field. Some entries with _PARTITIONTIME 2016-11-11 actually have a timestamp 2016-11-10. This is usually caused by application delay, latency, or Bigquery outages such as the one on Tuesday.
Using the bq shell api I am able to use table decorators to target the partition directly. In the below shell example I am able to target a partition using --destination_table and the query can limit it's select using a similar partition decorator.
bq --project foo-dev query --allow_large_results --destination_table 'my_ds.table_name$20161110' --noflatten_results --append_table 'SELECT * from [my_ds.table_name$20161111] WHERE timestamp BETWEEN TIMESTAMP(\'2016-11-10\') AND TIMESTAMP(\'2016-11-11\')'However in this api it's strange that I have to create 2 table instances, one to address the table (unpartitioned) and one to insert to the table. See below.
table = dataset.table(TABLE_NAME)
assert not table.exists()
table.create(client=client, dataset=dataset)
table.insert_data(rows, client=client) # will only insert to the current partition
partition = dataset.table(dataset + '.' + TABLE_NAME + '$20161111')
partition.insert_data(rows=rows, client=client) # should insert to the 20161111 partitionThe Table.insert_data source code seems to just pass the table name to the underlying REST api. So assuming the REST api will accept and use the table partition decorator then the code above should insert as I expect. The api would be easier to use for my use case if it handled a partition label if one is provided at the time of insert.