Way to filter multiple conditions in Power Query from folder containing CSV files

Question

I need a help from you with correction/suggestion of query I am using to get a data from folder in CSV format. Warning upfront: I don't know, how to write this shortly.

Few informations first:

Tools are limited for Power Query, Excel, VBA
Data query will run once in a month, so bigger loading time is not a BIG issue, although lower time is ofc preferable
I have chosen Power Query approach, because the source data have to be used in another Excel file, but with different set of rules (and this is part of my current issue).
Basic issue with my code is that it runs for really long time, there are big amount of conditions that need to be met and I have to use similar approach for another reason/tool/file. And I want the people to just press Refresh to get the information needed.

Description:

I have source of data in CSV files in a folder. Naming convention doesn't exist, because multiple people do the export of the data from system. Because of that I've used folder option in PQ.

The size of the data is currently around 400-600 MB. Name of the columns might be changing, for which are the first line in M-code to get around.

My main struggle is:

There are several conditions, that need to be implemented. I didn't want to write multiple if statements, because the code would get really ugly, and the number of conditions is in tenths and across multiple columns. For that reason I've implemented (let's call it TT) translation table where I have all columns where filtering could be used and last column of that TT is concatenation of all columns. If in the condition I don't care about one of the columns, I fill it with wildcard "*".

So the TT might be looking like:

| PC | CLIENT | FN  | TC | STRING      |

|----|--------|-----|----|-------------|

| 11 | *      | NEW | AC | 11*NEWAC    |

| 47 | 000001 | NEW | *  | 47000001NEW*|

etc...

PC is PoC, FN is FUNCTION, TC is Transaction code (in code below).

Then in the code I am replacing the wildcard with appropriate column's value from PQ and check, if the concatenated string from same columns in PQ is contained in TT (last column is made into a list). Code below works for the easier solution, but it's pretty hardcoded, because I've wanted to know if it's even possible.

After data update I run VBA macro to append the data into "database" table (ofc check for existing values is there) so the data load can be minimized. For that reason the first part code is used. Basically the code I could split into three parts:

Basic transformation: Loading from folder, getting rid of unconventional names and checking with other folder if it contains the same named files to minimize load.
Filtering data: Consists of merging the PQ table with TT table, replacing the wildcards with correct column and then creating filtering string to check if the text in concatenated PQ table contains at least one value from the TT list.
Final transformation of used data to get the information I need (It's mainly about late settlements from market)

Whole M-Code with comments

let
    /*Here starts basic data transformation to limit errors in CSV files due to 
    different conventions */
    Source = Folder.Files(source),
    #"Uppercased Text1" = Table.TransformColumns(Source,{{"Name", Text.Upper, type text}}),
    #"Merged Queries2" = Table.NestedJoin(#"Uppercased Text1", {"Name"}, q_Archive, {"Name"}, "q_Archive", JoinKind.LeftAnti),
    #"Added Custom" = Table.AddColumn(#"Merged Queries2", "Data", each Csv.Document(File.Contents([Folder Path] & "\" & [Name]),[Delimiter=";", Encoding = 1252, QuoteStyle = QuoteStyle.None])),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Data"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Other Columns", "Table", each Table.PromoteHeaders([Data])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Table"}),
    #"Added Custom2" = Table.AddColumn(#"Removed Other Columns1", "Upper", each Table.TransformColumnNames([Table],Text.Upper)),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Upper"}),
    #"Expanded Upper" = Table.ExpandTableColumn(#"Removed Other Columns2", "Upper", {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}, {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Upper",{{"SETTL.AMOUNT", "SETTL.AMOUNT2"}, {"SETTL.CUR.", "SETTL.CUR.2"}, {"19A CURRENCY CODE", "19A CURRENCY CODE2"}, {"19A AMOUNT", "19A AMOUNT2"}}),
    #"Added Custom10" = Table.AddColumn(#"Renamed Columns1", "19A AMOUNT", each if[SETTL.AMOUNT2]=null then [19A AMOUNT2] else [SETTL.AMOUNT2]),
    #"Added Custom11" = Table.AddColumn(#"Added Custom10", "19A CURRENCY CODE", each if [SETTL.CUR.2] = null then [19A CURRENCY CODE2] else [SETTL.CUR.2]),
    #"Renamed Columns" = Table.RenameColumns(#"Added Custom11",{{"FUNCTION", "FUNCTION2"}}),
    #"Added Custom8" = Table.AddColumn(#"Renamed Columns", "FUNCTION", each if[FUNCTION2]=null then [MESSAGE FUNCTION] else[FUNCTION2]),
    #"Removed Other Columns3" = Table.SelectColumns(#"Added Custom8",{"35B ISIN", "CLIENT", "EXP.SETTL.DATE", "INSTR.ID", "POC", "RECEPTION DATE", "TRANSACTION CODE", "19A AMOUNT", "19A CURRENCY CODE", "FUNCTION"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns3",{"POC", "CLIENT", "FUNCTION", "TRANSACTION CODE", "EXP.SETTL.DATE", "RECEPTION DATE", "19A AMOUNT", "19A CURRENCY CODE"}),
    #"Replaced Value" = Table.ReplaceValue(#"Reordered Columns","""","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","=","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
    #"Uppercased Text" = Table.TransformColumns(#"Replaced Value1",{{"POC", Text.Upper, type text}, {"CLIENT", Text.Upper, type text}, {"FUNCTION", Text.Upper, type text}, {"TRANSACTION CODE", Text.Upper, type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Uppercased Text", each ([FUNCTION] = "NEWM")),
    #"Merged Queries" = Table.NestedJoin(#"Filtered Rows", {"POC"}, tbl_setup_pocList, {"PocList"}, "tbl_setup_pocList", JoinKind.Inner),
        #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"tbl_setup_pocList"}),


/* Here ends the data transformation part
   and the part for list transformations start*/
        #"Added condition" = Table.AddColumn(#"Removed Columns","COND", each (
            ((Table.FromRecords({
                [PC = List.ReplaceValue(Table.Column(tbl_filtering_string, "POC"),"*",[POC], Replacer.ReplaceText),
                CL = List.ReplaceValue(Table.Column(tbl_filtering_string, "CLIENT"),"*",[CLIENT], Replacer.ReplaceText),
                FN = List.ReplaceValue(Table.Column(tbl_filtering_string, "FUNCTION"),"*",[FUNCTION], Replacer.ReplaceText),
                TC = List.ReplaceValue(Table.Column(tbl_filtering_string, "TRANSACTION CODE"),"*",[TRANSACTION CODE], Replacer.ReplaceText)]}
            ))))),
        #"Expanded COND" = Table.ExpandTableColumn(#"Added condition", "COND", {"PC", "CL", "FN", "TC"}, {"PC", "CL", "FN", "TC"}),
        #"Added Custom3" = Table.AddColumn(#"Expanded COND", "Test",  each (List.Combine(
            {
                {_[PC]},{_[CL]},{_[FN]},{_[TC]}
            }
        ))),
        #"Expanded Test" = Table.AddColumn(#"Added Custom3", "Test2", each (Table.FromColumns(_[Test],null))),
        #"Removed Columns2" = Table.RemoveColumns(#"Expanded Test",{"PC", "CL", "FN", "TC", "Test"}),
        #"Added Custom4" = Table.AddColumn(#"Removed Columns2", "String", each Table.ToList([Test2],Combiner.CombineTextByDelimiter(""))),
        #"Removed Columns3" = Table.RemoveColumns(#"Added Custom4",{"Test2"}),
        #"Added Custom6" = Table.AddColumn(#"Removed Columns3", "CONTAIN_STR", each [POC]&[CLIENT]&[FUNCTION]&[TRANSACTION CODE]),
        #"Added Custom5" = Table.AddColumn(#"Added Custom6", "Cond", each List.Contains(_[String],[CONTAIN_STR])),
        #"Filtered Rows1" = Table.SelectRows(#"Added Custom5", each ([Cond] = false)),

        /*Here the code for filtering ends and final transformations occur */

        #"Removed Columns4" = Table.RemoveColumns(#"Filtered Rows1",{"String", "CONTAIN_STR", "Cond"}),
        #"Merged Queries1" = Table.NestedJoin(#"Removed Columns4", {"POC"}, tbl_setup_exotics, {"Exotic_PoC"}, "tbl_setup_exotics", JoinKind.LeftOuter),
        #"Expanded tbl_setup_exotics" = Table.ExpandTableColumn(#"Merged Queries1", "tbl_setup_exotics", {"Exotic_PoC"}, {"Exotic_PoC"}),
        #"Replaced Value2" = Table.ReplaceValue(#"Expanded tbl_setup_exotics",null, "Non Exotic",Replacer.ReplaceValue,{"Exotic_PoC"}),
        #"Removed Errors" = Table.RemoveRowsWithErrors(#"Replaced Value2", {"EXP.SETTL.DATE", "RECEPTION DATE"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Removed Errors",{{"EXP.SETTL.DATE", type date}, {"RECEPTION DATE", type date}}),
        #"Added Custom7" = Table.AddColumn(#"Changed Type", "RD", each (if [Exotic_PoC] <> "Non Exotic" then Date.AddDays([RECEPTION DATE],1)else [RECEPTION DATE])),
        #"Filtered Rows2" = Table.AddColumn(#"Added Custom7", "LB" , each if [RD]>=[EXP.SETTL.DATE] then "Late" else "Not"),
        #"Added Custom9" = Table.AddColumn(#"Filtered Rows2", "DAYS_LATE", each [RD]-[EXP.SETTL.DATE]),
        #"Inserted Year" = Table.AddColumn(#"Added Custom9", "Year", each Date.Year([EXP.SETTL.DATE]), Int64.Type),
        #"Inserted Month" = Table.AddColumn(#"Inserted Year", "Month", each Date.Month([EXP.SETTL.DATE]), Int64.Type),
        #"Changed Type1" = Table.TransformColumnTypes(#"Inserted Month",{{"19A AMOUNT", type number}}),
        #"Grouped Rows" = Table.Group(#"Changed Type1", {"Year", "Month", "POC", "19A CURRENCY CODE", "DAYS_LATE", "LB"}, {{"Count", each Table.RowCount(_), type number}, {"Countervalue", each List.Sum([19A AMOUNT]), type text}, {"ISIN", each Text.Combine([35B ISIN],";"), type text}, {"INSTR.ID", each Text.Combine([INSTR.ID], ";"), type text}}),
        #"Merged Queries3" = Table.NestedJoin(#"Grouped Rows", {"Year", "Month", "19A CURRENCY CODE"}, q_Xrates, {"Year", "Month", "Currency"}, "q_Xrates", JoinKind.LeftOuter),
        #"Expanded q_Xrates" = Table.ExpandTableColumn(#"Merged Queries3", "q_Xrates", {"Rate"}, {"Rate"}),
        #"Replaced Value3" = Table.ReplaceValue(#"Expanded q_Xrates",null,1,Replacer.ReplaceValue,{"Rate"}),
        #"Added Col" = Table.AddColumn(#"Replaced Value3", "CV", each [Countervalue]/[Rate]),
        #"Remove Countervalue" = Table.RemoveColumns(#"Added Col", {"Countervalue"})
    in
        #"Remove Countervalue"

Questions

I know this approach sounds over-complicated, but it makes it work (unfortunately it takes a long time to refresh). But is it really good? Aren't there other options, considering limited tool usage mentioned in the beginning?
How can I make this code better? I believe it could be partially re-made into function, but since I am quite a beginner in PQ, I cannot imagine how.
How can I use same approach, for same source data, but with bigger complexity? You can understand that as more columns to add to the filtering string.
Do you have other suggestions?

End comments

I am now pretty desperate and my written text might be confusing sometimes.
I don't have any issue providing some kind of Visio chart to show my logic in more graphical way (I am more familiar with that) and also with relationship overview.
I also don't have issue provide anonymized data (since it might be partially confidential). If you'd need that one, please refer to preferred service.
I don't mind working on my code, if I am pushed in correct direction. For that Q. #1 is priority. So basically is this good approach and can it be easily adjustable for another same, but more complicated purpose?

I really appreciate your time.

*/ MK */

Kosuke Sakai Kosuke Sakai · Accepted Answer · 2019-11-18T10:04:07

If I were to do this, I would write a function that compiles the filter condition table into a function, then apply it with Table.SelectRows.

// Compile the condition table into a function that can be applied in row filtering.
filterCondition = compileFilterConditionTable(tbl_filtering_string),

#"Filtered Rows" = Table.SelectRows(#"Table after Preceding Steps", filterCondition)

Isn't this looking much easier to trace the steps?

Below is an example code of a function that compiles condition table into a logical function. I'm not sure this works correctly for your case, because I'm not completely understanding the requirement.

compileFilterConditionTable =

    let compileFilterConditionTable = (filterConditionTable as table) as function =>
            let recordConditions = List.Transform(
                    Table.ToRecords(filterConditionTable),
                    compileFilterConditionRecord)
            in applyCombine(recordConditions, List.AnyTrue),

        compileFilterConditionRecord = (cond as record) as function =>
            let fieldNameValues = List.Transform(
                    Record.FieldNames(cond),
                    each [Name = _, Value = Record.Field(cond, Name)]
                ),
                fieldConditions = List.Transform(fieldNameValues, compileFieldCondition)
            in applyCombine(fieldConditions, List.AllTrue),

        compileFieldCondition = (fieldNameValue as record) as function =>
            let name = fieldNameValue[Name],
                value = fieldNameValue[Value]
            in
                if value = "*" then (record as record) as logical => true
                else (record as record) as logical => Record.Field(record, name) = value,

        applyCombine = (functions as list, combiner as function) as function =>
            (value) => combiner(List.Transform(functions, (f) => f(value)))

    in compileFilterConditionTable

Anyway, M is a functional programming language, so it would help to think and code it in functional way. Break down the entire logic into small parts, so that each small parts will be easy enough to understand. Write your code as reusable small functions, and combine them to build the whole.

Way to filter multiple conditions in Power Query from folder containing CSV files

1 Answers