0
votes

I have a data like as shown below

Subject_id T1  T2  T3  T4  T5
1234
1234       21  22  23  24  25
3456       34  31    
3456       34  31  36  37  39
5678       65  64  62  61  67
5678       65  64  62      67
9876       12  13  14  15  16
4790       47  87  52  13  16

As you can see above, subject_ids 1234,3456 and 5678 are repeating.

I would like to remove those repeating subjects when they have null/empty/blank value in any of the columns like T1,T2,T3,T4,T5.

Now the problem is in real time, I have more than 250 columns and not sure whether I can put 250 where clause checking for null value. So, I was trying with row_number(), rank(). Not sure which one is better. The below is what I was trying

SELECT *,ROW_NUMBER() OVER(PARTITION BY subject_id,T1,T2,T3,T4,T5) NULLS FIRST
from table A;

But it throws syntax error Syntax error: Unexpected keyword NULLS at [1:62]

I expect my output to be like below

Subject_id T1  T2  T3  T4  T5
1234       21  22  23  24  25   
3456       34  31  36  37  39
5678       65  64  62  61  67
9876       12  13  14  15  16
4790       47  87  52  13  16

As you can see, the output doesn't contain rows which had at least 1 null/empty/blank value in T1,T2,T3,T4,T5 columns.

Can help please?

2

2 Answers

2
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE NOT REGEXP_CONTAINS(FORMAT('%t', t), r'NULL')  

If to apply to sample data from your question - output is

Row Subject_id  t1  t2  t3  t4  t5   
1   1234        21  22  23  24  25   
2   3456        34  31  36  37  39   
3   5678        65  64  62  61  67   
4   9876        12  13  14  15  16   
5   4790        47  87  52  13  16   
0
votes

I think you want:

SELECT *,
       ROW_NUMBER() OVER (PARTITION BY subject_id
                          ORDER BY (T1 IS NULL OR T2 IS NULL OR T3 IS NULL OR T4 IS NULL OR T5 IS NULL) DESC
                         )
FROM table A;

I might approach this problem differently, but this appears to be what you are trying to write.