2
votes

I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.

My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to TIMESTAMP works (thanks Elliot).

However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append, Truncate or Drop existing table. If the table is date partitioned and you use Truncate it will remove all existing data, not just data in that partition.

Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.

BigQuery table schema:

enter image description here

Partition details:

enter image description here

The data I'm ingesting is simple. In one flow:

+------------+--------+
|    date    |  name  |
+------------+--------+
| 2018-08-08 | Josh1  |
| 2018-08-08 | Josh2  |
+------------+--------+

In the other flow:

+------------+--------+
|    date    |  name  |
+------------+--------+
| 2018-08-09 | Josh1  |
| 2018-08-09 | Josh2  |
+------------|--------+

It overwrites the data in both cases.

1
From BigQuery Data Type Conversions, you can use Dataprep's "Datetime (timestamp)" type to write to a TIMESTAMP in the BigQuery table, right? I don't think there is a way to use DATE, though.Elliott Brossard
Thanks for the info @ElliottBrossard. I've managed to get it working using TIMESTAMP instead but another problem has arised. Edited the question. Is there a feature request thread somewhere?Josh Laird
You could submit a feature request to the Cloud Dataprep issue tanker.Elliott Brossard
I have the same question! did you manage to do it in DataPrep?MT467
Hi HuaZhang: unable to select partition. The destination table textbox is uneditable, you have to select it from the GUI. Ideally, I would want to schedule this dataprep job so that it imports the latest date of data and not hardcode a partition. I believe this is likely the wrong use case for Dataprep. @MT467, I ended up just writing the query in SQL and scheduling with ComposerJosh Laird

1 Answers

0
votes

You ca create a partitioned table bases on DATE. Data written to a partitioned table is automatically delivered to the appropriate partition.

Data written to a partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.

Append the data to have the new data added to the partitions.

You can create the table using the bq command:

bq mk --table --expiration [INTEGER1] --schema [SCHEMA] --time_partitioning_field date 

time_partitioning_field is what defines which field you will be using for the partitions.