2
votes

I have a Google Cloud bucket with multiple folders, each containing CSVs.

I want to create eventually one BigQuery table, containing all the CSVs, but add a column stating the original folder. I thought to first create a BigQuery table for each such folder and then union them.

Is there a quick way to create multiple BigQuery tables, one for each subfolder under a Google Cloud bucket? Plus - if there's an easy way to automatically add the folder name as a new column - it could be great!

1

1 Answers

2
votes

I wrote a blog post about this:

https://medium.com/google-cloud/bigquery-lazy-data-loading-ddl-dml-partitions-and-half-a-trillion-wikipedia-pageviews-cd3eacd657b6

The trick is to create a federated query in BigQuery pointing to a glob of your files (gs://bucket/folder/*), which gives you the pseudo column _FILE_NAME:

SELECT *, _FILE_NAME fn 
FROM `fh-bigquery.views.wikipedia_views_gcs` 
LIMIT 10

enter image description here

Ta-da!