In my work I discovered a potential shortcoming of the use of RANK() over the (possibly more recent? https://cloud.google.com/bigquery/docs/reference/standard-sql/numbering_functions) alternative numbering function ROW_NUMBER().
with minimal_reproducible as (
select '[email protected]' as user_email, 'Joe' as user_first_name, 'John' as user_last_name, 123456790 as time, 1 is_deleted
union all
select '[email protected]', 'Joe', 'John', 123456789, 0
union all
select '[email protected]', 'Jill', 'John', 123456789, 0
)
select user_email, user_first_name, user_last_name, time, is_deleted from (
select *,
rank() over (partition by user_email order by time desc) as rank
from minimal_reproducible) inner_table
where rank = 1
The accepted answer does provide the desired solution, except in the event in a tie in the order by clause, were duplicate records are again returned:
with minimal_reproducible as (
select '[email protected]' as user_email, 'Joe' as user_first_name, 'John' as user_last_name, 123456789 as time, 1 is_deleted
union all
select '[email protected]', 'Joe', 'John', 123456789, 0
union all
select '[email protected]', 'Jill', 'John', 123456789, 0
)
select user_email, user_first_name, user_last_name, time, is_deleted from (
select *,
rank() over (partition by user_email order by time desc) as rank
from minimal_reproducible) inner_table
where rank = 1;
A better solution therefore is to use ROW_NUMBER() in the place of RANK() to ensure (albeit arbitrarily) unique user_email come what may:
with minimal_reproducible as (
select '[email protected]' as user_email, 'Joe' as user_first_name, 'John' as user_last_name, 123456789 as time, 1 is_deleted
union all
select '[email protected]', 'Joe', 'John', 123456789, 0
union all
select '[email protected]', 'Jill', 'John', 123456789, 0
)
select user_email, user_first_name, user_last_name, time, is_deleted from (
select *,
row_number() over (partition by user_email order by time desc) as row_number
from minimal_reproducible) inner_table
where row_number = 1;
I hope this is helpful to anyone using this approach to de-dupe their tables.