2
votes

The following query:

select
  corpus_date
  ,sum(count(*)) over (partition by corpus_date) as words_year
  ,corpus
  ,count(*) as words

  from [publicdata:samples.shakespeare]

group by
  corpus_date
  ,corpus

order by
  corpus_date

results in the error message SELECT * cannot be combined with selecting other fields or expressions. Now I'm wondering whether I'm missing an obvious mistake or whether BigQuery really doesn't allow combining windowed and aggregate functions. I haven't been able to find the answer in the documention. The error message does not yield any search results and in any case seems inappropriate.

(I've used the Shakespeare database here for replication purposes only.)

EDIT: The desired result can obviously be obtained with a subselect, like so:

select
  corpus_date
  ,sum(words) over (partition by corpus_date) as words_year
  ,corpus
  ,words

  from (select
          corpus_date
          ,corpus
          ,count(*) as words

          from [publicdata:samples.shakespeare]

       group by
          corpus_date
          ,corpus
       ) sub

order by
   corpus_date

So I should clarify that my question is simply whether anyone can confirm this unexpected behaviour.

EDIT2: Apparently it's also not possible to directly (i.e. without appealing to another subselect) use the result of a windowed function in a calculation :-( :

select
  corpus
  ,corpus_date
  ,words
  ,1.*words/sum(words) over (partition by corpus_date) as perc_of_year

  from (select
          corpus_date
          ,corpus
          ,count(*) as words

          from [publicdata:samples.shakespeare]

       group by
          corpus_date
          ,corpus
       ) sub

order by
   corpus_date
   ,corpus
1

1 Answers

1
votes

You need to contain in a subselect as you need to apply a group by for the Aggregation function, hence cannot be combined with a window function.