0
votes

I have a table looking like this in BigQuery:

enter image description here

The first field is id, the second field is url. One line represents someone (an ID) who visited a URL. Let's say my URLs are url1, url2, url3. I would like to know how many unique IDs visited at least once the following tuples of URLs:

(url1, url2)
(url1, url3)
(url2, url3)
(url1, url2, url3)

Basically, get the number of ids linked to all combinations of url from the urls list, but only when the number of urls visited is not 0. I expect it to give a very big output for a big list of urls. Is there a request I can write in BigQuery that would help with this problem ? May be an operator that can generate all combinations for my input ?

Thank you for your help

1
You have to implement some specific logic behind, there is no dedicated clause in Bigquery which can address this expectation. For the reference look at this thread, applying UDF to afford combinations of a value for a single column. Is it something that you can be interested in? - Nick_Kh
You have a low rate. Important on SO - you can mark accepted answer by using the tick on the left of the posted answer, below the voting. See meta.stackexchange.com/questions/5234/… for why it is important! Also important to vote on answer. Vote up answers that are helpful. ... You can check about what to do when someone answers your question - stackoverflow.com/help/someone-answers. Following these simple rules you increase your own reputation score and at the same time you keep us motivated to answer your questions :o) please consider! - Mikhail Berlyant

1 Answers

2
votes

May be an operator that can generate all combinations for my input ?

Yes, you can use ML.NGRAMS function as in below simplified example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, 'url1' url UNION ALL
  SELECT 2, 'url2' UNION ALL
  SELECT 3, 'url2' UNION ALL
  SELECT 2, 'url1' UNION ALL
  SELECT 3, 'url2' UNION ALL
  SELECT 2, 'url3'
)
SELECT tuple FROM (  
  SELECT ARRAY_AGG(DISTINCT url) urls
  FROM `project.dataset.table`
), UNNEST(ML.NGRAMS(urls, [1,3], ',')) tuple

with output

Row tuple    
1   url1     
2   url1,url2    
3   url1,url2,url3   
4   url2     
5   url2,url3    
6   url3     

which you can then use as a input into final calculation