1
votes

I have a table which has data is below format:

enter image description here

Corresponding to an ID there is a year variable which represents the first time the reading reached a steady state (in the actual data its a month day combination), there will also be data for time period before the stabilization period. The a columns represent a measure taken in that give year. I want to create a field which would sum only those fields relevant for a given id i.e. for Id=B I want to sum columns a2015-a2019. I want this to be done dynamically. But am not sure how to do it without using case when statements since the actual data is even more granular.

I intially tried to pivot the table at id, year and value level so i can first find the steady state time period for each id and simply sum/aggregate only those value post that but transpose feature is not available in standard-sql.

3
In your sample data, you can just sum all the columns. I don't understand what the challenge is.Gordon Linoff
what have you tried so far?RoMEoMusTDiE
@GordonLinoff Edited the question with why i couldn't just sum all the columnsRaj
@maSTAShuFu I tried to transpose and then aggregate which seemed like the simpler option but i couldn't transpose the data.Raj
If the non-relevant fields are blank, wouldn't sum just ignore those fields? I don't think I fully understand why summing a2013 to a2019 for all id's wouldn't work, the nulls will just be ignored.samredai

3 Answers

1
votes

Below is for BigQuery Standard SQL and quite generic to hopefully address your requirement ("how to do it without using case when statements since the actual data is even more granular.")

#standardSQL
SELECT t.*, 
  (
    SELECT SUM(CAST(SPLIT(kv, '":')[SAFE_OFFSET(1)] AS INT64))
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"a(\d{4}":\d*)')) kv
    WHERE CAST(SPLIT(kv, '":')[OFFSET(0)] AS INT64) >= year
  ) total,
  ARRAY (
    SELECT AS STRUCT SPLIT(kv, '":')[OFFSET(0)] AS key, SPLIT(kv, '":')[SAFE_OFFSET(1)] AS value
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"a(\d{4}":\d*)')) kv
    WHERE CAST(SPLIT(kv, '":')[OFFSET(0)] AS INT64) >= year
  ) details
FROM `project.dataset.table` t

if to apply to sample data from your question - result is

Row id  year    a2013   a2014   a2015   a2016   a2017   a2018   a2019   total   details.key details.value    
1   a   2014    0       342     2432    34234   645     123     65      37841   2014        342  
                                                                                2015        2432     
                                                                                2016        34234    
                                                                                2017        645  
                                                                                2018        123  
                                                                                2019        65   
2   b   2015    0       0       54      234     34      5656    3       5981    2015        54   
                                                                                2016        234  
                                                                                2017        34   
                                                                                2018        5656     
                                                                                2019        3    
3   c   2016    0       0       0       765     34654   2345    654     38418   2016        765  
                                                                                2017        34654    
                                                                                2018        2345     
                                                                                2019        654    

As you can see - I included here extra column Details so you can see which exactly entries are getting summed - this is just for troubleshooting. you can remove this from the query as it is not really needed

1
votes

Following is a standard logic for your purpose. Syntax may different for BigQuery, but you can adjust accordingly and use the given logic below-

SELECT A.id,A.year,SUM(A.Val)
FROM
(
    SELECT id,year,[a2013] Val,2013 Yr FROM your_table
    UNION ALL
    SELECT id,year,[a2014],2014 FROM your_table
    UNION ALL
    SELECT id,year,[a2015],2015 FROM your_table
    UNION ALL
    SELECT id,year,[a2016],2016 FROM your_table
    UNION ALL
    SELECT id,year,[a2017],2017 FROM your_table
    UNION ALL
    SELECT id,year,[a2018],2018 FROM your_table
    UNION ALL
    SELECT id,year,[a2019],2019 FROM your_table
)A
WHERE A.year <= A.Yr
GROUP BY A.id,A.year
ORDER BY 1
0
votes

You may try using this.

select Id, year, (isnull(a2013,0) + isnull(a2014,0) + isnull(a2015,0) + isnull(a2016,0) 
                + isnull(a2017,0) + isnull(a2018,0) + isnull(a2019,0) ) as Total
from table