1
votes

I want to import a large csv to a bigquery partitioned table that has a timestamp type column that is actually the date of some transaction, the problem is that when I load the data it imports everything into one partition of today's date.

Is it possible to use my own timestamp value to partition it? How can I do that.

2
Important on SO, you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see meta.stackexchange.com/questions/5234/… for why it is important. Also important to vote on answer. Vote up answers that are helpful. There are more ... You can check about what to do when someone answers your question - stackoverflow.com/help/someone-answers.Mikhail Berlyant

2 Answers

1
votes

In BigQuery, currently, partitioning based on specific column is not supported.
Even if this column is date related (timestamp).
You either rely on time of insertion so BigQuery engine will insert into respective partition or you specify which exactly partition you want to insert your data into
See more about Creating and Updating Date-Partitioned Tables

0
votes

The best way to do that today is by using Google Dataflow [1]. You can develop a streaming pipeline which will read the file from Google Cloud Storage bucket and insert the rows into BigQuery's table.

You will need to create the partitioned table manually [2] before running the pipeline, because Dataflow right now doesn't support creating partitioned tables

There are multiple examples available at [3]

[1] https://cloud.google.com/dataflow/docs/

[2] https://cloud.google.com/bigquery/docs/creating-partitioned-tables

[3] https://cloud.google.com/dataflow/examples/all-examples