I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.
My first issue was that Dataprep would not correctly pick up DATE
type columns, but converting them to TIMESTAMP
works (thanks Elliot).
However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append
, Truncate
or Drop
existing table. If the table is date partitioned and you use Truncate
it will remove all existing data, not just data in that partition.
Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.
BigQuery table schema:
Partition details:
The data I'm ingesting is simple. In one flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-08 | Josh1 |
| 2018-08-08 | Josh2 |
+------------+--------+
In the other flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-09 | Josh1 |
| 2018-08-09 | Josh2 |
+------------|--------+
It overwrites the data in both cases.
TIMESTAMP
in the BigQuery table, right? I don't think there is a way to useDATE
, though. – Elliott BrossardTIMESTAMP
instead but another problem has arised. Edited the question. Is there a feature request thread somewhere? – Josh Laird