1
votes

I'm migrating some calculations from Google Sheets to BigQuery and need to write a SUMPRODUCT script that offsets each row.

Here is the data in Google Sheets form. Column_1, Column_2 are given, SUMPRODUCT is a calc:

Column_1 Column_2 SUMPRODUCT
   0         1        0
   5         0        10
   0         1        0
   5         0        5

Column 1 starts in cell A1.

SUMPRODUCT Formulas for each row are as follows:

=SUMPRODUCT(A2:A5,$B$2:$B$5)
=SUMPRODUCT(A3:A6,$B$2:$B$5)
=SUMPRODUCT(A4:A7,$B$2:$B$5)
=SUMPRODUCT(A5:A8,$B$2:$B$5)

In Biquery I can create the first SUMPRODUCT row using SELECT SUM (column_1 * column_2) AS SUMPRODUCT FROM Table_1

After this, Offsetting the column 1 array each row is challenging. My final data set will 500+ rows I need to do this for. I've also tried to OFFSET function in SQL but ran into errors.

2

2 Answers

1
votes

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 0 pos, 0 Column_1, 1 Column_2 UNION ALL
  SELECT 1, 5, 0 UNION ALL
  SELECT 2, 0, 1 UNION ALL
  SELECT 3, 5, 0 
), b AS (
  SELECT pos, Column_2 FROM `project.dataset.table`
)
SELECT 
  pos, Column_1, Column_2,
  (SELECT SUM(Column_1 * Column_2)
    FROM UNNEST(a) WITH OFFSET AS pos
    JOIN b USING(pos)  
  ) SUMPRODUCT
FROM (
  SELECT *, 
    ARRAY_AGG(Column_1) OVER(ORDER BY pos ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) a
  FROM `project.dataset.table` t
)

with result

Row pos Column_1    Column_2    SUMPRODUCT   
1   0   0           1           0    
2   1   5           0           10   
3   2   0           1           0    
4   3   5           0           5    

As you can see from above - you must have some field which will play role of row number in sheet - in my answer i used pos column for this as an example

0
votes

As a slightly different approach from @Mikhail's answer, you can use JS based external UDFs provided in BigQuery to simplify these tasks:

CREATE TEMP FUNCTION sumproduct(column_1 array<int64>, column_2 array<int64>)
RETURNS int64
LANGUAGE js AS """
  output = 0;
  for (var i=0; i<column_1.length; i++){
     output += (column_1[i]*column_2[i]);
  }
  return output;
""";

with sumproducts as (
  select 
     row_number() over () as idx, 
     sumproduct(aggs, b) as sp from (
        select 
           array_agg(column_1) over (order by idx rows between current row and 3 following) aggs, 
           array_agg(column_2) over () b 
        from (
           select row_number() over () as idx, column_1, column_2 from `dataset.table`
        )
     )
),
items as (
   select row_number() over() as idx, column_1, column_2 from `dataset.table`
)

select r.column_1, r.column_2, spd.sp from items r join sumproducts spd on r.idx = spd.idx

Hope it helps.