We want to separate our data in tables with before and after processing.
The data currently resides in tables in a BigQuery dataset with name "pre".
The question now is whether to place the resulting table in a different dataset with name "post", or to keep it in the same dataset.
To consider: scripts will be joining tables from both datasets in their queries.
Is there any internal optimization in BigQuery when joining tables in one same dataset and that we will miss by splitting that in two?
For example, data locality? One might think that joining a table from a dataset in Europe with one from a dataset in the US might lead to extra effort to move data around. Could that incur extra network costs? CPU? But in our case, both datasets will be US.
Regarding access control, different data sets can be used to assign different permissions to one or another service account; but we will not make use of that.