Create a Deduplicated view in bigquery table based on a key

Question

TL;DR:

Say I have a table receipt with columns (ReceiptID, ItemName, Spend).
How can I create a view that limits to a maximum of one unique ReceiptID row?

Details of what I tried and it's drawbacks

// Creating the view `view_of_unique_receipts`
WITH ordered_receipts AS (
    SELECT *, 
           ROW_NUMBER() OVER ( PARTITION BY ReceiptName) AS rn
    FROM receipt_*
)
SELECT * EXCEPT (rn) 
FROM ordered_receipts
WHERE rn = 1

Doing SELECT * FROM receipt costs e.g. 50 GB
Doing SELECT ReceiptName, Spend FROM receipt costs e.g. 10 GB
Doing SELECT ReceiptName, Spend FROM view_of_unique_receipts costs 50GB
- How can I make this only cost 10gb?
- It costs 50GB because the row_number partition always selects all the columns (even though the final query is not asking for it)

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2020-12-31T22:48:35

WITH ordered_receipts AS (
    SELECT *, 
           ROW_NUMBER() OVER ( PARTITION BY ReceiptName) AS rn
    FROM receipt_*
)
SELECT ReceiptName, Spend 
FROM ordered_receipts
WHERE rn = 1

Create a Deduplicated view in bigquery table based on a key

2 Answers