0
votes

I have a number of columns in my table which have text values that fall into categories - e.g. column "ABC" has 9000 rows but every row must have a value in the set {"A","B","C"}. Other columns like Gender have "M"/"F"/null

For each column, I'd like to convert it into an integer list in-place - so A:1, B:2, C:3 etc.

I've been trying out using List.Distinct to extract the values to a temp table, adding an index column to that and using a join to transform the initial column based on that mapping in the temp table. However this seems slow and I'm not sure how to run this over all columns in my table (or at least Table.ColumnsOfType(Source, {type nullable text}) to select the categorical columns...).

Any suggestions?

Before

Gender Fruit [...]
F Cat
F Dog
M Lemon
M Dog
M Lemon
null Cat
M Dog

After

Gender Fruit [...]
1 1
1 2
2 3
2 2
2 3
null 1
2 2
1

1 Answers

2
votes

In PowerQuery, this seems to work for any number of columns

Replace all nulls with something else, here +=+

Add Index

Unpivot

Remove duplicates

Group, add index to each group

enter image description here

Merge back into original and expand

Repivot

Remove extra columns

Before and After:

enter image description here

Full code:

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source,null,"+=+",Replacer.ReplaceValue,Table.ColumnNames(Source)),
#"Added Index" = Table.AddIndexColumn(#"Replaced Value", "Index", 0, 1),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),

// derive a table of replacements
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Attribute", "Value"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Attribute"}, {{"GRP", each Table.AddIndexColumn(_, "Index2", 1, 1), type table}}),
#"Expanded GRP" = Table.ExpandTableColumn(#"Grouped Rows", "GRP", {"Value", "Index2"}, {"Value", "Index2"}),

//replace originals
#"Merged Queries" = Table.NestedJoin(#"Unpivoted Other Columns",{"Attribute", "Value"},#"Expanded GRP",{"Attribute", "Value"},"EG",JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "EG", {"Index2"}, {"Index2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table1",{"Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Index2", List.Sum),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"