0
votes

there are similar questions on here but either I cannot figure out how to convert to my situation (Likely) or they are not that similar but read close to what i want to do (BigQuery: How to calculate the running count of distinct visitors for each day and category?)

Anyway...

I have an order table in bigquery that has many column headers which i will need to use all of them but i will just list a few of them here

orderID, customerID, transactionDate, Revenue

(I will need to get all of the fields)

I want to work out the instance of a customer ID in the table as a new column, so if i had place 3 orders, and my customer ID was 1234, the 1st instance in the data table would be 1 in a new column, the 2nd would be 2 and the 3rd would be 3

for example say my data looks like this

> OrderID     ||    CustomerID    ||    TransactionDate    ||    Revenue
> 1           ||    1             ||     01/01/15          ||     £20 
> 2           ||    2             ||     01/01/15          ||     £20 
> 3           ||    3             ||     01/01/15          ||     £20 
> 4           ||    1             ||     01/01/15          ||     £20 
> 5           ||    1             ||     01/01/15          ||     £20 
> 6           ||    2             ||     01/01/15          ||     £20 
> 7           ||    4             ||     01/01/15          ||     £20

I would like to run a query against it the adds in a new column stating the instance if the CustomerID record so it would like like

> OrderID     ||    CustomerID    ||    TransactionDate    ||    Revenue ||Instance
> 1           ||    1             ||     01/01/15          ||     £20    ||1 
> 2           ||    2             ||     01/01/15          ||     £20    ||1
> 3           ||    3             ||     01/01/15          ||     £20    ||1 
> 4           ||    1             ||     01/01/15          ||     £20    ||2
> 5           ||    1             ||     01/01/15          ||     £20    ||3 
> 6           ||    2             ||     01/01/15          ||     £20    ||2 
> 7           ||    4             ||     01/01/15          ||     £20    ||1

Each time an already seen customerID happens the Instance increments 1

Also I need to run this against an ever growing table that is currently at 1.6 million rows.

I hope some one can help me out.

Cheers

John

2

2 Answers

2
votes

Window Functions are helping you here:

Window functions enable calculations on a specific partition, or "window", of a result set. Each window function expects an OVER clause that specifies the partition, in the following syntax:

OVER (
      [PARTITION BY <expr>]
      [ORDER BY <expr>]
      [ROWS <expr> | RANGE <expr>]
     )

PARTITION BY is always optional. ORDER BY is optional in some cases, but certain window functions, such as rank() or dense_rank(), require the clause.

JOIN EACH and GROUP EACH BY clauses can't be used on the output of window functions. To generate large query results when using window functions, you must use PARTITION BY.

select *,
row_number() over (partition by CustomerID order by TransactionDate) as Instance
from  (select 1 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 2 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 3 as OrderID, 3 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 4 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 5 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 6 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 7 as OrderID, 4 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue)
 order by OrderID

Returns:

+-----+---------+------------+-----------------+---------+----------+---+
| Row | OrderID | CustomerID | TransactionDate | Revenue | Instance |   |
+-----+---------+------------+-----------------+---------+----------+---+
|   1 |       1 |          1 | 01/01/15        | £20     |        1 |   |
|   2 |       2 |          2 | 01/01/15        | £20     |        1 |   |
|   3 |       3 |          3 | 01/01/15        | £20     |        1 |   |
|   4 |       4 |          1 | 01/01/15        | £20     |        2 |   |
|   5 |       5 |          1 | 01/01/15        | £20     |        3 |   |
|   6 |       6 |          2 | 01/01/15        | £20     |        2 |   |
|   7 |       7 |          4 | 01/01/15        | £20     |        1 |   |
+-----+---------+------------+-----------------+---------+----------+---+
1
votes

You should use window function like row_number OVER (Partition by yourgroupbyfields order by transaction_date)