Power BI - Finding and Filtering Duplicates Across Columns

Question

I've got a set of data where four individual parts and part numbers (all text) are compiled to become combinations. They can combine in 2 different ways, either:

Column A then B then C then D, OR
Column A then C then B then D

I've got the combinations each in two separate columns.

Unfortunately, this leads to some duplicates being created, because A+B+C+D is functionally equivalent (for my purposes) to A+C+B+D. When the duplicates are in a single column, removing them is simple enough. Removing duplicates from different columns is a little bit trickier, and that's where I'm looking for your help.

Example

I've mocked up an example of how the items are concatenated. As you can see, there are no duplicates within the same column, but there are duplicates of the same combinations (colour-coded for simplicity) appearing in both columns, and often in a different row of data so it's not as simple as looking across the row. I'd love your help to identify these and filter (or remove) the duplicates.

Also, note that the relationship between Concat 1 and Concat 2 is pretty much meaningless (for these purposes) other than that I'd like to filter/remove any duplicate values at all, whether those duplicates are within the same column or across the two columns.

What output are you expecting to get? The table as above with the colored cells removed? — Alexis Olson
Ideally, the table above with only a single instance of the coloured cells remaining and the other(s) removed so all that's left is unique combinations. A single column might be okay, but I'm hesitant to simply join them because it's an awful lot of data to join so I'm a bit concerned about speed. — Code Newb

horseyride horseyride · Accepted Answer · 2018-10-12T13:04:13

Assuming you start with Table1 having six columns without a header row, where columns 5/6 have the two combinations, this removes duplicates across rows and columns. It does not remove duplicates on its own row (ABBA/ABBA)

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column1", "Column2", "Column3", "Column4"}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value")
in #"Pivoted Column"

if you also need to remove duplicates on its own row use this instead:

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column1", "Column2", "Column3", "Column4"}, "Attribute", "Value"),
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value"),
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([Column6] <> null) and ([Column5] <> null))
in #"Filtered Rows"

Power BI - Finding and Filtering Duplicates Across Columns

1 Answers