3
votes

My Table data looks like

 Col1  | Col2 | Col3
    1  |   2  | NULL
    1  |   2  | 3
    1  | NULL | NULL
    1  |   5  | NULL
    2  | NULL | NULL

I want to write a query, so that I get only the most specific entries. ie. in the above example row1 is more specific row3 as Value of "Col1" is same in both but Value in "Col2" is more specific( not null) in row1, similarly row2 is more specific than row1.

For the above dataset the result should look like:

Col1 | Col2 | Col3
  1  |  2   |  3
  1  |  5   | NULL
  2  | NULL | NULL

NOTE: Datatype of column can be anything.

2
Do you need to do this in pure SQL, or is there a programming language you are using to execute the query in which you can process the data? - RacerNerd
@RacerNerd: I need to do it in pure SQL. - Kumar
Did you try anything? - Jafar Kofahi
I don't understand what "more specific" means. If you had "1, <NULL>, 3" in the table, how would that affect the results? How would "2, <NULL>, 2"? - Gordon Linoff
@GordonLinoff: By more specific, I mean another row with exactly the same data but some value for NULL field. ie. for "1, <NULL>,3" - "1,2,3" is more specific. - Kumar

2 Answers

4
votes

I am assuming that the columns are "ordered" as they are in your query, so you don't have a case where col2 is null and col3 is not null:

select col1, col2, col3
from table t
where (col3 is not null) or
      (col3 is null and col2 is not null and
       not exists (select 1
                   from table t2
                   where t2.col1 = t.col1 and t2.col2 = t.col2 and t2.col3 is not null
                  )
      ) or
      (col2 is null and col1 is not null and
       not exists (select 1
                   from table t2
                   where t2.col1 = t.col1 and t2.col2 is not null
                  )
      );

The logic behind this is:

  1. Take all rows where col3 is not null.
  2. Take all rows where col2 is not null and there are no similar rows with a value in col3.
  3. Take all rows where col1 is not null and there are no similar rows with a value in col2.

EDIT:

In Oracle, you can do this more simply:

select col1, col2, col3
from (select t.*,
             max(col3) over (partition by col1, col2) as maxcol3,
             max(col2) over (partition by col1) as maxcol2
      from table t
     ) t
where (col3 is not null) or
      (col2 is not null and maxcol3 is null) or
      (col1 is not null and maxcol2 is null);

EDIT II: (With a clarified definition of "more specific".)

I think this is the extrapolation of the logic. It requires looking at all combinations:

select col1, col2, col3
from (select t.*,
             max(col3) over (partition by col1, col2) as maxcol3_12,
             max(col2) over (partition by col1, col3) as maxcol2_13,
             max(col1) over (partition by col2, col3) as maxcol1_23,
             max(col1) over (partition by col1) as maxcol1_2,
             max(col1) over (partition by col2) as maxcol1_3,
             max(col2) over (partition by col1) as maxcol2_1,
             max(col2) over (partition by col3) as maxcol2_3,
             max(col3) over (partition by col2) as maxcol3_1,
             max(col3) over (partition by col2) as maxcol3_2,
      from table t
     ) t
where (col1 is not null and col2 is not null and col3 is not null) or
      (col1 is not null and col2 is not null and maxcol3 is null) or
      (col1 is not null and col3 is not null and maxcol2 is null) or
      (col2 is not null and col1 is not null and maxcol3 is null) or
      (col2 is not null and col3 is not null and maxcol1 is null) or
      (col3 is not null and col1 is not null and maxcol2 is null) or
      (col3 is not null and col2 is not null and maxcol1 is null) or
      (col1 is not null and maxcol2 is null and maxcol3 is null) or
      (col2 is not null and maxcol1 is null and maxcol3 is null) or
      (col3 is not null and maxcol1 is null and maxcol2 is null);

The first combination says "keep this row if all values are not null". The second says: "keep this row if col1 and col2 are not null and col3 never has a value". And so on to the last one that says: "keep this row is col3 is not null and col1 and col2 never have values".

This might simplify to:

where not ((col1 is null and maxcol1 is not null) or
           (col2 is null and maxcol2 is not null) or
           (col3 is null and maxcol3 is not null)
          );
0
votes

Divide n Conquer kind of Approach!

Demo : SQL Fiddle

SELECT col1,col2,MAX(col3)
FROM test
WHERE col1 is NOT NULL AND col2 is NOT NULL
GROUP BY col1,col2
UNION
SELECT col1,MAX(col2),col3
FROM test
WHERE col1 is NOT NULL AND col3 is NOT NULL
GROUP BY col1,col3
UNION
SELECT MAX(col1),col2,col3
FROM test
WHERE col2 is NOT NULL AND col3 is NOT NULL
GROUP BY col2,col3
UNION
SELECT col1,NULL,NULL
FROM test
GROUP BY COL1
HAVING COUNT(COL2) = 0 AND COUNT(COL3) = 0