When does BigQuery flatten an intermediate result set? I was under the impression that it was only when FLATTEN
was invoked, but I've encountered an example where the result is flattened without a FLATTEN
.
This is the case - this base query returns one record:
select count(*) from publicdata:samples.trigrams
where ngram = 'der Griindung im'
+-----+
| f0_ |
+-----+
| 1 |
+-----+
When queried, you can see that the record has a repeated field that is repeated twice.
select * from publicdata:samples.trigrams
where ngram = 'der Griindung im'
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
| ngram | first | second | third | fourth | fifth | cell_value | cell_volume_count | cell_volume_fraction | cell_page_count | cell_match_count | cell_sample_id | cell_sample_text | cell_sample_title | cell_sample_subtitle | cell_sample_authors | cell_sample_url |
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
| der Griindung im | der | Griindung | im | NULL | NULL | 2007 | 54 | 0.008746355685131196 | 54 | 54 | NULL | NULL | NULL | NULL | NULL | NULL |
| der Griindung im | der | Griindung | im | NULL | NULL | 2008 | 47 | 0.007612568837058633 | 47 | 47 | NULL | NULL | NULL | NULL | NULL | NULL |
+------------------+-------+-----------+-------+--------+-------+------------+-------------------+----------------------+-----------------+------------------+----------------+------------------+-------------------+----------------------+---------------------+-----------------+
When I add a filter on cell.value
, I get two records instead of one - but I never flattened so I'm not sure about the behavior here. My expectation is that this would return the same output as the previous COUNT
above. It doesn't:
select count(*) from publicdata:samples.trigrams
where ngram = 'der Griindung im' and cell.value in ('2007', '2008')
+-----+
| f0_ |
+-----+
| 2 |
+-----+
What this means is that while I expect select * from publicdata:samples.trigrams where ngram = 'der Griindung im'
and select * from publicdata:samples.trigrams where ngram = 'der Griindung im' and cell.value in ('2007', '2008')
to return the same output, they don't because one is implicitly flattened and the other is not. While this may not seem like a huge issue, this could matter significantly if it was part of a nested query that expected an intermediate result to be flattened or repeated.
Under what conditions does BigQuery flatten results without an explicit FLATTEN
?