0
votes

I have a data table that records cost savings data and I have 1 row per project. This has overall project type data such as annual spend, annual savings, etc. but also has the months the savings fall into. To pivot on this data, I converted it to a table with PowerQuery but some columns repeat such as annual spend for each month where there are savings so I might get 10 rows for savings which is correct, but the annual spend is duplicated 10 times. Can I remove duplicates in just those columns retaining the other data.

I have searched and tried various solutions but haven't found one that works. I am not set on data table format, so am open to anything.

Below is a sample of the data

Sample of Data Entry

Sample of PowerQuery

PowerQuery Results

As you will see, Baseline Spend, Negotiated Spend, Savings Amount are all shown for each row and I need to use these in a pivot/slicer.

Any help would be appreciated.

Regards,

Keith

1
Hard to experiment with just a screenshot, or an idea of what you want the results to look like, but maybe you could aggregate the savings and delete the months columns.Ron Rosenfeld

1 Answers

0
votes

I think one solution might be to "only keep the first1 annual spend per project". More abstractly, "only keep the first value in column(s) X per column(s)Y".

Below is some mock/dummy data. I only want to keep the highlighted values in my annual spend column (as the highlighted values are the first "annual spend" figures per "project").

Example data

This is the M code I'm using to achieve this. (To try it, open the Query Editor > Advanced Editor (near top right) > copy-paste code below to there > OK).

let
    OnlyKeepFirstValueInColumn = (someTable as table, columnsToNullify as list) as table =>
        let
            firstRow = Table.FirstN(someTable, 1), // This assumes first row contains a non-blank value.
            remainingRows = Table.Skip(someTable, 1),
            loopAndNullify = List.Accumulate(columnsToNullify, remainingRows, (tableState, currentHeader) => Table.TransformColumns(tableState, {{currentHeader, each null}})),
            combined = firstRow & loopAndNullify
        in combined,
    FirstValueOfColumnsPerGroup = (someTable as table, groupByColumns as list, columnsToNullify as list) =>
        let
            group = Table.Group(someTable, groupByColumns, {{"toCombine", each OnlyKeepFirstValueInColumn(_, columnsToNullify), type table}}),
            combined = Table.Combine(group[toCombine])
        in combined,
    aggregatedTable = Table.FromColumns({Text.ToList("aaabbbccccdddeeefg"), List.Repeat({1000}, Text.Length("aaabbbccccdddeeefg"))}, type table [project=text, annual spend=number]),
    transformed = FirstValueOfColumnsPerGroup(aggregatedTable, {"project"}, {"annual spend"})
in
    transformed

The important bit to understand is this line:

transformed = FirstValueOfColumnsPerGroup(aggregatedTable, {"project"}, {"annual spend"})

in which you should replace:

  • aggregatedTable with whatever variable/expression contains your table
  • {"project"} with the name of your "project" column (keep the curly braces {} though as they let you pass in several columns if needed)
  • {"annual spend"} with the names of whichever column(s) you want to keep only the first value in (keep the curly braces {})

This is what I get (which I think is similar to what you want):

Output result


1To keep things simple, we'll say "first" here means the value in the first row. It could have meant "first non-null value" or "first value satisfying some particular condition or logic", but your data suggests the simpler definition will work okay.