0
votes

I'm working on optimizing an excel report I have that takes in data from three different sources (two of which are large). I am using power query to do all my sumifs and countifs ahead of time to minimize the calculations in excel.

The problem is, it seems to be running slower in power query. Here is a countifs I did:

Custom Column:

List.Count(
Table.SelectRows(
    damages,
    (InnerTable) => InnerTable[locator_id] = [Trainee]
    and 
    InnerTable[defect date] <= [Date of the Training]
)[workday_id]
)

This column is being added to a table that has 1,200 rows. The table I am counting from (damages) is two appended tables totaling 80,000 rows. It's taking over 10 minutes to run this query (with just that calculation, haven't added the others yet). There's a counter in the bottom that totals the size that's loaded, and it gets over 8GB before being loaded (despite all my csvs being 100 mb total).

One thing I noticed, while building up the query, it was very fast until I added List.Count. It appears that final aspect of the query is significantly slowing it down.

I've tried playing around with doing a Table.RowCount instead of converting to list and counting that. I've tried various settings for the damages table, both loading into data model and not. Fast Load doesn't appear to make a difference either.

I just timed it with fast load. Took 32 seconds to load the first 25 rows, and that's just with one calculation.

Edit: I've also taken out the append for the damages table. No change in performance.

Edit #2: per the request below, attached are sample data and the queries.
It's simplified and sanitized, but it has the same performance issues. Both preview and load are agonizingly slow.

I don't see how to attach files, but what is needed is in the dropbox link. https://www.dropbox.com/sh/xa9rja1soj2aysi/AADnw_yC7huumERlr9VQpp2ma?dl=0

Query #1:

let
    Source = Csv.Document(File.Contents("C:\Users\TristenHannah\Dropbox (USICLLC)\Training\quality first\training reporting\slow query\damagetable1.csv"),[Delimiter=",", Columns=52, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"numcol1", Int64.Type}, {"textcol1", type text}, {"numcol2", Int64.Type}, {"textcol2", type text}, {"textcol3", type text}, {"dateforcalc1", type date}, {"textcol4", type text}, {"datecol1", type datetime}, {"textcol5", type text}, {"textcol6", type text}, {"textcol7", type text}, {"textcol8", type text}, {"textcol9", type text}, {"textcol10", type text}, {"textcol11", type text}, {"textcol12", type text}, {"textcol13", type text}, {"datecol2", type datetime}, {"datecol3", type datetime}, {"numcol3", Int64.Type}, {"numcol4", Int64.Type}, {"textcol14", type text}, {"textcol15", type text}, {"textcol16", type text}, {"textcol17", type text}, {"textcol18", type text}, {"textcol19", type text}, {"textcol20", type text}, {"textcol21", type text}, {"numcol5", Int64.Type}, {"numcol6", Int64.Type}, {"numcol7", Int64.Type}, {"numcol8", Int64.Type}, {"datecol4", type date}, {"textcol22", type text}, {"textcol23", type text}, {"textcol24", type text}, {"textcol25", type text}, {"textcol26", type text}, {"datecol5", type datetime}, {"dateforcalc2", type datetime}, {"textcol27", type text}, {"textcol28", type text}, {"textcol29", type text}, {"textcol30", type text}, {"employeeid", Int64.Type}, {"textcol31", type text}, {"textcol32", type text}, {"numcol9", Int64.Type}, {"numcol10", Int64.Type}, {"textcol33", type text}, {"textcol34", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"numcol1", "textcol2", "textcol3", "textcol4", "datecol1", "textcol5", "textcol6", "textcol7", "textcol8", "textcol9", "textcol10", "textcol11", "textcol12", "textcol13", "datecol2", "datecol3", "numcol3", "numcol4", "textcol14", "textcol15", "textcol16", "textcol17", "textcol18", "textcol19", "textcol20", "textcol21", "numcol5", "numcol6", "numcol7", "numcol8", "datecol4", "textcol22", "textcol23", "textcol24", "textcol25", "textcol26", "datecol5", "textcol27", "textcol28", "textcol29", "textcol30", "textcol31", "textcol32", "numcol9", "numcol10", "textcol33", "textcol34"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "dateforcountifs", each if [dateforcalc2] = null then [dateforcalc1] else [dateforcalc2]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"dateforcountifs", type date}})
in
    #"Changed Type1"

Query #2

let
    Source = Csv.Document(File.Contents("C:\Users\TristenHannah\Dropbox (USICLLC)\Training\quality first\training reporting\slow query\maintable.csv"),[Delimiter=",", Columns=17, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", Int64.Type}, {"Start time", type datetime}, {"Completion time", type datetime}, {"Email", type text}, {"Name", type text}, {"EmployeeID", Int64.Type}, {"Number", Int64.Type}, {"JoinDate", type date}, {"Number2", type number}, {"Number3", Int64.Type}, {"Text1", type text}, {"Number4", Int64.Type}, {"Number5", Int64.Type}, {"Number6", Int64.Type}, {"Number7", Int64.Type}, {"Text2", type text}, {"Text3", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each List.Count(
    Table.SelectRows(
        damagetable1,
        (InnerTable) => InnerTable[employeeid] = [EmployeeID] and InnerTable[dateforcountifs] <= [JoinDate]
    )[employeeid]
))
in
    #"Added Custom"
1
Please edit your question, and add all your code and some sample data - Ricardo Diaz
@RicardoDiaz okay done! - T.Hannah
What's the expected result. I see that you're joining the two tables by employee id - Ricardo Diaz
The end result is expected -- joining based on employee and prior to a specific date (specified by row). The issue isn't the data. The issue is the performance. It is slower than just doing a countifs in excel. - T.Hannah
So you join by employee id, then you compare if [dateforcountifs] < [JoinDate] and then you count the result where that condition is true, right? - Ricardo Diaz

1 Answers

1
votes

Based on the files you provided.

Took around 20 secs on a Excel 365 64 bit Win 10

M Code:

let
    Source = Csv.Document(File.Contents("C:\Users\RicardoDiaz\Descargas\slow query\maintable.csv"),[Delimiter=",", Columns=17, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", Int64.Type}, {"Start time", type datetime}, {"Completion time", type datetime}, {"Email", type text}, {"Name", type text}, {"EmployeeID", Int64.Type}, {"Number", Int64.Type}, {"JoinDate", type date}, {"Number2", type number}, {"Number3", Int64.Type}, {"Text1", type text}, {"Number4", Int64.Type}, {"Number5", Int64.Type}, {"Number6", Int64.Type}, {"Number7", Int64.Type}, {"Text2", type text}, {"Text3", type text}}),
    Merge = Table.NestedJoin(#"Changed Type", {"EmployeeID"}, damagetable1, {"employeeid"}, "damagetable1", JoinKind.LeftOuter),
    Expand = Table.ExpandTableColumn(Merge, "damagetable1", {"dateforcountifs"}, {"dateforcountifs"}),
    RemoveOther = Table.SelectColumns(Expand,{"EmployeeID", "JoinDate", "dateforcountifs"}),
    Order = Table.Sort(RemoveOther,{{"EmployeeID", Order.Ascending}, {"JoinDate", Order.Ascending}, {"dateforcountifs", Order.Ascending}}),
    AddConditions = Table.AddColumn(Order, "less than", each if [dateforcountifs] <= [JoinDate] then true else null),
    Group = Table.Group(AddConditions, {"EmployeeID", "JoinDate", "less than"}, {{"Recuento", each Table.RowCount(_), Int64.Type}}),
    Filter = Table.SelectRows(Group, each ([less than] = true)),
    RemoveOther2 = Table.SelectColumns(Filter,{"EmployeeID", "JoinDate", "Recuento"})
in
    RemoveOther2