0
votes

I have a small (1.5 MB) Excel file that contains multiple worksheets. I need to transform each worksheet (two significant transformations are created as separate functions) and then expand the results dynamically (i.e. each worksheet can have a different number of columns, so I needed to extract the list of all distinct column names beforehand).

It's all working fine and the output is meeting my expectations, however, I noticed that when refreshing Power Query is loading over 10 MB of data. After it is done, the Load window resets and >10 MB of data is being loaded again.

enter image description here

Here's the M-code that I am using. I have tested each section and it seems like Expanded step might be the slowest one.

let
    Source = Excel.Workbook(File.Contents("xxx.xlsx"), null, true),
    Split = Table.SplitColumn(Source, "Name", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Name.1", "Name.2", "Name.3"}),
    TidyUp = Table.RenameColumns(
        Table.RemoveColumns(Split,{"Item", "Kind", "Hidden", "Name.3"}), 
        {{"Name.1", "PORTFOLIO"}, {"Name.2", "DATE"}}),
    GetCurrency = Table.AddColumn(TidyUp, "CURRENCY", each GetCurrencyFromSpreadsheet([Data])),
    GetStresses = Table.AddColumn(GetCurrency, "fx", each GetStresses([Data])),
    ColNames = Table.ColumnNames(Table.Combine(GetStresses[fx])),
    Expanded = Table.ExpandTableColumn(GetStresses, "fx", ColNames),
    RemoveData = Table.RemoveColumns(Expanded,{"Data"})
in
    RemoveData

As a result, it takes about 5 minutes to process a single small Excel file. As we expect to receive more similar files in the future, I would like to check with you if you have any ideas what can I do to improve the code? Thanks.

1

1 Answers

1
votes

I would rebuild this using the Power Query Editor UI. That should lead to cleaner code with less redundant steps like your use of Table.Combine with a single input table.

The Table.AddColumn steps would probably be rebuilt as separate queries that are combined using Merge Queries. Set-based logic like that will usually outperform row-by-row function calls.