I have a table containing lines with an id ( representing a person ) linked to a url ( fields names are id and url ). An id can be linked to many urls. An id can be linked to a url multiple times. As a result, I would like to have for each domain we can find in the table, the number of ids in common with each other domain in the table. If 2 domains have no id intersection, I don't need the information: I would like to get only the information when the intersection is not null. The output could be:
- 1 line per unique domain found, contaning the domain and an array of tuple (domain, volume) representing an intersection between 2 domain and the volume intersecting for each domain where intersection is not empty
- Just lines with domain1, domain2, intersection volume
Is it possible to do such thing in BigQuery ?
Thank you for your help