0
votes

I have a table in BigQuery which gets new data daily and appends it to the current table. This table is called : score_FULL. Additionally, I keep the individual scores on a daily basis, which can be accessed by score_20180125 if we choose today's date. Daily scores are appended to score_FULL on a daily basis.

score_FULL contains:

visitorID         score
#Older scores first
1                 0.15
2                 0.78
3                 0.12
6                 0.90
------------------------
2                 0.22
6                 0.65
7                 0.61
10                0.24
------------------------
1                 0.31
2                 0.41
10                0.12
-------------------------
#Newest scores appended

I would like to see user score changes. Each time a user gets a new score, we append it horizontally. Every-time we get a new user, we append it vertically. Assuming a user can only get 1 score per day, the ideal solution is:

visitorID     score1      score2       score3
1             0.15        0.31
2             0.78        0.22         0.41
3             0.12 

E.g. a table that grows horizontally (new scores) AND vertically (new users)

I can do something similar using a sequence of Left Joins on individual tables, but this would only give me the visitor data from 1st table we run left joins from.

Note: I can add another column Date which will simply repeat the same date for all values, if it makes things easier.

1
do you have in the full table the date for each row? - Pentium10
No I don't but I do have the date from score_DATE table, which gets appended to the FULL table. Although, if there is an easy solution, I can add the date to the table too. So it will append the same date for all rows appended that day - GRS
Have you ever worked with Window Functions or table wildcard queries? What's the maximum day span your data goes back? - Pentium10
@Pentium10 No, I haven't. Only goes back 6 days at the moment, but this will grow daily. I could probably make it so that only last n days are reported. - GRS
do you use table partitioning? If not why not? - Pentium10

1 Answers

2
votes

Instead of adding columns dynamically (which is quite a challenge here) - I would recommend aggregating respective visitor's scores in one column as either array or string of individual scores

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.score_FULL` AS ( 
  SELECT 1 visitorID, 0.15 score UNION ALL
  SELECT 2, 0.78 UNION ALL
  SELECT 3, 0.12 UNION ALL
  SELECT 6, 0.90 UNION ALL
  SELECT 2, 0.22 UNION ALL
  SELECT 6, 0.65 UNION ALL
  SELECT 7, 0.61 UNION ALL
  SELECT 10, 0.24 UNION ALL
  SELECT 1, 0.31 UNION ALL
  SELECT 2, 0.41 UNION ALL
  SELECT 10, 0.12 
)
SELECT 
  visitorID,
  ARRAY_AGG(score) scores_as_array,
  STRING_AGG(CAST(score AS STRING)) scores_as_list
FROM `project.dataset.score_FULL`
GROUP BY visitorID   

with output as

Row visitorID   scores_as_array scores_as_list   
1   1           0.15            0.15,0.31    
                0.31         
2   2           0.78            0.78,0.22,0.41   
                0.22         
                0.41         
3   3           0.12            0.12     
4   6           0.9             0.9,0.65     
                0.65         
5   7           0.61            0.61     
6   10          0.24            0.24,0.12    
                0.12