Sum multiple rows based on duplicate column data without formula

0

votes

Based on data available in columns A to D (can be any 100's of columns), I want to sum up all the rows for column E to K (can be any 100's of columns)

The rows should sum up based on duplicate data from rows A to D, the result required as below

This is easily possible to do, with sumif, but would like to know if possible natively in excel or power query without creating unique id for each column or using sumif function or formula of any sort

excelpowerquery

Pretty similar: stackoverflow.com/questions/53186122/…. What did you try? – BigBen

0

votes

In powerquery .. unpivot, group, pivot, done.

More detail:

Click select first 4 columns, right click, unpivot other columns

Click select first 4 columns and the new Attribute column, right click, group by

Use Operation:Sum on Column:Value name:count and hit OK

Click select Attribute column and transform .. pivot column... , for value column choose count

File Close and load

Full sample code:

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Code1", "Code2", "Code3", "Code4"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Code1", "Code2", "Code3", "Code4", "Attribute"}, {{"Count", each List.Sum([Value]), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count", List.Sum)
in  #"Pivoted Column"

0

votes

To solve a problem like this, I first do a concrete example and then generalize it. I made a small table in Excel like so:

Code1	Code2	2-Jul-20	3-Jul-20	4-Jul-20	5-Jul-20	6-Jul-20
ERT	EXC	10		6	15	2
ERT	EXC	2	3		23	1
CON	HOR			3
CON	HOR	6		2	356	3

Then I clicked within the table and created a Power Query referencing it. After opening the Power Query Editor, there is a Group By function on the Home tab. It's pretty straightforward to choose the columns you want and the Sum function in a toy example like this.

Then, I opened the Advanced Editor to see what code was auto-generated. It looked something like this:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    
    #"Grouped Rows orig" = Table.Group(Source, {"Code1", "Code2"}, {{"2-Jul-20", each List.Sum([#"2-Jul-20"]), type nullable number}, {"3-Jul-20", each List.Sum([#"3-Jul-20"]), type nullable number}, {"4-Jul-20", each List.Sum([#"4-Jul-20"]), type nullable number}, {"5-Jul-20", each List.Sum([#"5-Jul-20"]), type nullable number}, {"6-Jul-20", each List.Sum([#"6-Jul-20"]), type nullable number}})
in
    #"Grouped Rows orig"

Typically, a Power Query expression is a series of transformations applied to a table, where each one operates on the table as returned from the previous. Here, we start with the original table as "Source" and then do the grouping. The parameters are a little messy, but what we have is: (1) the input table, (2) a list of the column names to group by, and (3) a list of 3-item lists, each of which describe an aggregated column. The sublists have the output column name, the function that does the aggregation, and the data type.

In Power Query, "each" is syntactic sugar for a single parameter function whose parameter is just an underscore. But also, when you have a record or row, you can just use [column] instead of _[column].

So how to generalize the operation you want to do? My first thought is that a convenient grouping function should have two parameters, based on your description. The first is the table to group, and the second is the number of columns starting from the left to group by. If you don't have them arranged contiguously, of course, you could do something else.

    sumFromColumn = (t, n) => let
                                    cList = Table.ColumnNames(t),
                                    toGroup = List.FirstN(cList, n),
                                    toSum = List.RemoveFirstN(cList, n),
                                    sumFunc = (cName) => {cName, each List.Sum(Record.Field(_, cName)), type nullable number}
                                in Table.Group(t, toGroup, List.Transform(toSum, each sumFunc(_))),

    #"Grouped Rows" = sumFromColumn(Source, 2), // Group by the first 2 columns and sum the rest

Here is the generalized function I made, which appears to match the original Table.Group operation that was generated by the interface.

The let statement arranges things for readability but does not imply a particular sequence that they happen in. Power Query figures out the dependencies and executes the statements in whatever order is needed.

The list of column names of the table is defined as cList, and split into toGroup and toSum. Then, sumFunc is defined as a function taking a column name and returning the 3-item list needed to define an aggregation operation. In Power Query, functions can return other functions any which way. So here we are defining a function that returns a list, with a function in it. Then we can use List.Transform to take the list of aggregated columns and turn it into the appropriate parameters for Table.Group.

Finally, the actual group by is done with a call like sumFromColumn(Source, 2), which is equivalent to the original statement that hard-codes the column names.

Code1	Code2	2-Jul-20	3-Jul-20	4-Jul-20	5-Jul-20	6-Jul-20
ERT	EXC	12	3	6	38	3
CON	HOR	6		5	356	3

This can easily be changed to sumFromColumn(Source, 1), in which case it will reduce to two rows, but then the second column being non-numeric, will become error values.

Or, you can use sumFromColumn(Source, 3), which will not add things up because the group by columns taken together are distinct.

This way you can easily aggregate any number of columns without caring about their names. I recommend both the Power Query M documentation on microsoft.com and reading about functional programming in general.

Sum multiple rows based on duplicate column data without formula

2 Answers