How to aggregate a different number of columns for each row

Question

I have a table which has data is below format:

Corresponding to an ID there is a year variable which represents the first time the reading reached a steady state (in the actual data its a month day combination), there will also be data for time period before the stabilization period. The a columns represent a measure taken in that give year. I want to create a field which would sum only those fields relevant for a given id i.e. for Id=B I want to sum columns a2015-a2019. I want this to be done dynamically. But am not sure how to do it without using case when statements since the actual data is even more granular.

I intially tried to pivot the table at id, year and value level so i can first find the steady state time period for each id and simply sum/aggregate only those value post that but transpose feature is not available in standard-sql.

In your sample data, you can just sum all the columns. I don't understand what the challenge is. — Gordon Linoff
@GordonLinoff Edited the question with why i couldn't just sum all the columns — Raj
@maSTAShuFu I tried to transpose and then aggregate which seemed like the simpler option but i couldn't transpose the data. — Raj
If the non-relevant fields are blank, wouldn't sum just ignore those fields? I don't think I fully understand why summing a2013 to a2019 for all id's wouldn't work, the nulls will just be ignored. — samredai

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2019-08-27T20:48:20

Below is for BigQuery Standard SQL and quite generic to hopefully address your requirement ("how to do it without using case when statements since the actual data is even more granular.")

#standardSQL
SELECT t.*, 
  (
    SELECT SUM(CAST(SPLIT(kv, '":')[SAFE_OFFSET(1)] AS INT64))
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"a(\d{4}":\d*)')) kv
    WHERE CAST(SPLIT(kv, '":')[OFFSET(0)] AS INT64) >= year
  ) total,
  ARRAY (
    SELECT AS STRUCT SPLIT(kv, '":')[OFFSET(0)] AS key, SPLIT(kv, '":')[SAFE_OFFSET(1)] AS value
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"a(\d{4}":\d*)')) kv
    WHERE CAST(SPLIT(kv, '":')[OFFSET(0)] AS INT64) >= year
  ) details
FROM `project.dataset.table` t

if to apply to sample data from your question - result is

Row id  year    a2013   a2014   a2015   a2016   a2017   a2018   a2019   total   details.key details.value    
1   a   2014    0       342     2432    34234   645     123     65      37841   2014        342  
                                                                                2015        2432     
                                                                                2016        34234    
                                                                                2017        645  
                                                                                2018        123  
                                                                                2019        65   
2   b   2015    0       0       54      234     34      5656    3       5981    2015        54   
                                                                                2016        234  
                                                                                2017        34   
                                                                                2018        5656     
                                                                                2019        3    
3   c   2016    0       0       0       765     34654   2345    654     38418   2016        765  
                                                                                2017        34654    
                                                                                2018        2345     
                                                                                2019        654

As you can see - I included here extra column Details so you can see which exactly entries are getting summed - this is just for troubleshooting. you can remove this from the query as it is not really needed

How to aggregate a different number of columns for each row

3 Answers