2
votes

I have this (example) data in Excel in a table named Table1:

| PC Item  | Priority |
|----------|----------|
| AN123169 | P3       |
| AN123169 | P1       |
| AN123169 | P1       |
| AN123169 | P1       |
| AN123169 | P3       |  

After sorting into Priority order and removing duplicates based on PC Item I would expect a P1 record to remain. Instead I'm getting P3.

The help file states:

Removes all rows from a Power Query table, in the Query Editor, where the values in the selected columns duplicate earlier values.

My understanding of this is it keeps the first record and removes subsequent records?

The M code I'm using is:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"PC Item", type text}, {"Priority", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"PC Item", Order.Ascending}, {"Priority", Order.Ascending}}),
    #"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"PC Item"})
in
    #"Removed Duplicates"

After #"Sorted Rows" I have this table, which is sorted correctly:

| PC Item  | Priority |
|----------|----------|
| AN123169 | P1       |
| AN123169 | P1       |
| AN123169 | P1       |
| AN123169 | P3       |
| AN123169 | P3       |  

After #"Removed Duplicates" I have:

| PC Item  | Priority |
|----------|----------|
| AN123169 | P3       |   

Surely I should have a P1 record?


My first thought was to reverse the sort, but with this original data I get P3 returned as the unique value:

| PC Item | Priority |
|---------|----------|
| AN310C4 | P3       |
| AN310C4 | P1       |
| AN310C4 | P1       |

and with this original data I get P1 returned:

| PC Item | Priority |
|---------|----------|
| AN310C4 | P1       |
| AN310C4 | P1       |
| AN310C4 | P3       |

The question:

So I guess my question is - how does sort work seeing as the subsequent M code appears to ignore the fact that I've sorted the data?

1

1 Answers

2
votes

Use Table.Buffer to cache intermediate query results and avoid query folding when deleting duplicates.


Sample data:

enter image description here


Updated M:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"PC Item", type text}, {"Priority", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"PC Item", Order.Ascending}, {"Priority", Order.Ascending}}),
    #"Buffer Table" = Table.Buffer(#"Sorted Rows"),
    #"Removed Duplicates" = Table.Distinct(#"Buffer Table", {"PC Item"})
in
    #"Removed Duplicates"

Result:

enter image description here


Documentation (very little) found here. Informative video found here.

I found other ways to "break" query folding using remove error lines in the table or adding an index-column (and removing it). However, a buffer seems neat to me.