We are running BigQuery for about 4 months now and we have some interesting observations and questions regarding pricings:
- Pricing model is quite unexplained it needs explanations for joins, unions and so on
- If I run join on small table my data processed will increase for each copy of small table that was multiplied inside bigquery engine?
- If I reuse same sub-query inside a query this is new data processing regardless it is the same subquery? Can this be optimized in any way? Doesn't bigquery cache it's data?
- Is there minimum size that is rounded up for data processing? Our observations think it is 4MB, is this correct?
- In Google Console there is a problem with geo locations, it says that most of the queries are done from CA which is incorrect since my company is in Europe and all development is done there. I checked all IP addresses and there is no CA ip. Is it possible to get more detailed report?
- Regarding to 5, we would like to get history of queries and price for each one. How can we do this? Must our software do this or?
- Regarding 3) I have this huge problem with data processing (this is why I asked all the questions). My entire dataset is about 150mb large and yet my DP size per query is 3 times larger? If bigquery reads my entire data it would not reach the sizes it shows on data processed. When are all the cases that data will be re-read and thus new DP will take place? And how do you count this?
Thank you.
Regards, Tomislav