I have a table with duplicate records. I want to remove them. I've created a column called "hash_code" which is just a sha1 hash of all the columns. Duplicate rows will have the same hash code. Everything is fine except when I tried to create a new table with a query containing GROUP BY. My table has RECORD data type, but the new table created flattens it even when I had specified it to not flatten. Seems like GROUP BY and the "-noflatten_results" flag doesn't place nice.
Here's an example command line I ran:
bq query --allow_large_results --destination_table mydataset.my_events --noflatten_results --replace
"select hash_code, min(event) as event, min(properties.adgroup_name) as properties.adgroup_name,
min(properties.adid) as properties.adid, min(properties.app_id) as properties.app_id,
min(properties.campaign_name) as properties.campaign_name from mydataset.my_orig_events group each
by hash_code "
In the above example, properties is a RECORD data type with nested fields. The resulting table doesn't have properties as RECORD data type. Instead it translated properties.adgroup_name to properties_adgroup_name, etc.
Any way to force BigQuery to treat the result set as RECORD and not flatten in GROUP BY?
Thanks!