1
votes

Background

I have a dataset with 10,000+ variables as column headers, which I want to reduce to the amount needed. I know how to select a sample of columns by listing columns that contain manually specified strings, say "glu" and "pep", that the columns must contain in order to be selected. This is the M code used to select the sample columns:

let
    Source = Excel.CurrentWorkbook(){[Name="data"]}[Content],
    ColumnsToSelect = List.Select(Table.ColumnNames(Source), each Text.Contains(_, "glu") or Text.Contains(_, "pep")),
    SelectColumns = Table.SelectColumns(Source, ColumnsToSelect)    
in
    SelectColumns

This Power Query produces a table that i call "Data". Since I want to select columns based on multiple strings they must contain, I have made a dynamic list of strings that I have called "Outcomes". I want my Power Query to utilize this list of strings when choosing what columns to select.

Question

Is it possible to get my Power Query to utilize this dynamic list in the List.Select() or Table.SelectColumns() function or any other function, that will make my Power Query select only the columns that contain the strings on the list?

1

1 Answers

1
votes

Use with this lines:

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],

    Source2 = Excel.CurrentWorkbook(){[Name="Outcomes"]}[Content],
    Outcomes = Source2[Outcomes],

    UnpivotedColumns = Table.UnpivotOtherColumns(Source, {}, "ColumnNames", "Filters"),

    FilteredRows = Table.SelectRows(UnpivotedColumns, each List.AnyTrue(List.Transform(Outcomes, (substring) => Text.Contains([Filters], substring)))),

    ColumnNames = List.Sort(List.Distinct(FilteredRows[ColumnNames]),Order.Ascending),
    SelectColumns = Table.SelectColumns(Source,ColumnNames)
in
    SelectColumns

the magic is in this line:

FilteredRows = Table.SelectRows(UnpivotedColumns, each List.AnyTrue(List.Transform(Outcomes, (substring) => Text.Contains([Filters], substring)))),