Remove duplicate rows from unpivoted table

Question

I have a flattened crosstab in Power Query that contains duplicate rows, with values for column 1 & 2 reversed. I'd like to remove the duplicates within Power Query so that I'm left with unique rows only.

I can achieve this within Excel using COUNTIF, but I was hoping to find a Power Query oriented solution. So far I've tried creating two new columns in Power Query (newCol1 = Col1+2, newCol2 = Col2+1), and matching them. However this only gives me the rows where Col1=Col2.

I feel like this should be straightforward but I'm just not getting it despite finding similar posts here and on other places online. Any help is greatly appreciated!

Seems to me your fixing a symptom and not the problem. Does the source data truly contain both entries? or were both entries caused by a processing problem in the data extract? I've seen this happen often when a = is used on a table join in SQL when a > or < should be used. — xQbert
The source data is a very large crosstab. I was looking for a good way of flattening a crosstab that was imported to Excel from another program, and using Power Query: "From Table-->Unpivot Other Columns" was suggested. — duncan.idahopotato

Alexis Olson Alexis Olson · Accepted Answer · 2018-02-15T17:11:14

I think @xQbert has a good point. You probably want to try to fix this upstream.

However, if you do want to fix it at the point, then I suggest the following:

Create a custom column that puts these in alphabetical order. You can use a formula like this:

if [Col 1] < [Col 2] then [Col 1] & [Col 2] else [Col 2] & [Col 1]

Select the newly created column (only) and click Remove Rows > Remove Duplicates.

Delete the new column if desired.

Remove duplicate rows from unpivoted table

1 Answers