0
votes

I'm looking into moving my analytics system across to BigQuery and I was wondering if someone could give me some pointers on the best way to structure my data.

I am logging analytics events that happen on a website. Each analytics event type must be stored in a separate table as they don't have similar fields.

I will be using partitions to separate by day, and running a query at midnight everyday on that days data to generate total counts which can then be stored elsewhere for faster querying for the analytics platform.

I create a dataset for each company, named such as company_123, company_124 etc.

Then within each dataset is the analytics for all the websites the company manages, with table names such as user_click_event_website_id, user_page_load_website_id etc.

I need to be able to query both 'all user clicks on website X' and 'all user clicks across all their websites'.

My question is, would it be better (cost & query time) to split my tables by website_id and run

select * from user_click_* and select * from user_click_website_x

or put them all in one table with a column for website_id and run

select * from user_click where website_id = 'website_x' and select * from user_click

Thanks

1

1 Answers

0
votes

Without knowing the detail of your scenario and data, it is not easy to say which one is always better. As a starting point, I feel one table with website_id column seems worth to be tried unless you noticed major issues.