Power Query: how to add one to a column when a specific values appear in an other column

Question

I have an ID column and I am looking for ways to increment my IDs each time a specific item appears in my Geography column (ItalyZ, ItalyM, UKY or UKM) is found.

The ID of ItalyZ starts at 0 and ends at 4000.

The ID of ItalyB starts at 4000 and ends at 8000.

The ID of UKY starts at 0 and ends at 4000.

The ID of UKM starts at 4000 and ends at 8000.

However, I am refreshing my file, and I will thus have from time to time new arrivals of "geographies" without the origins or first IDs. These boundaries/ranges are only known beginning and ends.

Here is a sample of my data:

  |---------------------|------------------|    
  |       ID            |   Geography      |
  |---------------------|------------------|
  |    AB0000           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4041           |      ItalyB      |
  |---------------------|------------------|
  |    BC0000           |      UKY         |
  |---------------------|------------------|
  |    BC4001           |      UKM         |
  |---------------------|------------------|
  |    NULL             |      ItalyZ      |
  |---------------------|------------------|
  |    NULL             |      ItalyZ      |
  |---------------------|------------------|
  |    NULL             |      UKY         |
  |---------------------|------------------|
  |    NULL             |      UKM         |
  |---------------------|------------------|

Here is my expected output :

  |---------------------|------------------|    
  |       ID            |   Geography      |
  |---------------------|------------------|
  |    AB0000           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4041           |      ItalyB      |
  |---------------------|------------------|
  |    BC0000           |      UKY         |
  |---------------------|------------------|
  |    BC4001           |      UKM         |
  |---------------------|------------------|
  |    AB0001           |      ItalyZ      |
  |---------------------|------------------|
  |    AB0001           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4042           |      UKY         |
  |---------------------|------------------|
  |    BC0001           |      UKM         |
  |---------------------|------------------|

I have been trying many various ways and trying to adapt running total solutions. I have also been trying to break apart my file in four different ones in order not to have an If function alternating between cases, and thus making it simpler, like this in my power query:

 #"Added Custom2" = Table.AddColumn(#"Reordered Columns", "Sum", each if [Geography] = "UKM" then [Number AB range below 4000] + 1 
else if [Geography] = "UKY" then [Number AB range above 4000] + 1 
else if [Geography] = "ItalyB" then [Number BC range above 5000]
else [Number BC range below 5000] + 1)

But absolutely nothing works. This maddening.

Constantly posting then deleting variants of the same question over and over is not a nice thing to do — horseyride
I am sorry. I realised that it was poorly formulated. I will update and return to the original afterwards. But I am indeed trying to finally completely and absolutely understand conditional count/sums in power query. I would like to be able to absolutely understand it. Is there a way to re-update a question in a thread? For example if the strategy is to formulate and look exhaustively all potential solutions to an issue? — Tomas Michel

Alexis Olson Alexis Olson · Accepted Answer · 2020-04-28T23:39:20

I will answer a further simplified problem as I don't want to address the ID letter prefixes.

Let's say we have the following table (I've included :

ID,  Group
-----------
0,     A
1,     A
300,   B
525,   C
null,  A
null,  B
null,  B
null,  C

And want to generate a new column NewID that will replace ID.

ID,  Group, NewID
------------------
0,     A,   0
1,     A,   1
300,   B,   300
525,   C,   525
null,  A,   2
null,  B,   301
null,  B,   302
null,  C,   526

Here's a method that uses Table.AddIndexColumn:

let
    Source = <First Table Above>,
    #"Grouped Rows" = Table.Group(Source, {"Group"}, {{"ID", each List.Max([ID]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", (C) => Table.AddIndexColumn(Table.SelectRows(Source, each _[Group] = C[Group]),"NewID",C[ID],1)),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"NewID"}, {"NewID"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"ID"})
in
    #"Removed Columns"

First, we group by Group to find the maximal ID per Group:

Then we add a new column where each row in the column is a table defined by filtering the original table down to just the current group and then adding an index column starting at the maximal ID we just found. This is the most complex step.

From here, we expand the Custom table column (choosing the column we don't already have) and remove the old ID column. We're now where we need to be short of any sorting or column typing we choose to do.

Edit: I've made a mistake in the above. Note that the NewID for Group A is 1,2,3 instead of 0,1,2 I was trying for.

To fix this for this simple example, you can use List.Min instead of List.Max in the group-by step.

For a more complex example, you may need to add an index column to the source table so you can merge back to it after expanding and only use the new NewID for previously null ID values since we can't guarantee they are contiguous.

Here's the code:

let
    Source = <First Table Above>,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Group"}, {{"ID", each List.Max([ID]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", (C) => Table.AddIndexColumn(Table.SelectRows(Table.Sort(#"Added Index",{"ID"}), each _[Group] = C[Group]),"NewID",C[ID]+1,1)),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Index", "NewID"}, {"Index", "NewID"}),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, #"Expanded Custom", {"Index"}, "Expanded Custom", JoinKind.LeftOuter),
    #"Expanded Expanded Custom" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom", {"NewID"}, {"NewID"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Expanded Custom", "ReplaceID", each if [ID] = null then [NewID] else [ID]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"ID", "NewID"})
in
    #"Removed Columns"

The complex step changes just a bit:

(C) => Table.AddIndexColumn(
           Table.SelectRows(
               Table.Sort(#"Added Index", {"ID"}),
               each _[Group] = C[Group]
           ),
           "NewID", C[ID] + 1, 1
       )

The difference is that we need to add a sort so the nulls come after all the already assigned ID values and start indexing the nulls at C[ID] + 1 instead of just C[ID].

Here's a version that has fewer steps (no group by, expand, or merge) but a bit more complicated function:

let
    Source = <First Table Above>,    
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", (C) => Table.SelectRows(#"Added Index", each _[Group] = C[Group])),
    #"Added NewID" = Table.AddColumn(#"Added Custom", "NewID", (C) => if C[ID] = null then Table.SelectRows(Table.AddIndexColumn(Table.SelectRows(C[Custom], each _[ID] = null), "NewID", List.Max(C[Custom][ID])+1,1), each _[Index] = C[Index]){0}[NewID] else C[ID]),
    #"Removed Columns" = Table.RemoveColumns(#"Added NewID",{"Custom"})
in
    #"Removed Columns"

The first added Custom column is just the indexed source table filtered to the current Group. Then we add the NewID column defined as:

(Read from the inside out.)

(C) =>
  if C[ID] = null
  then Table.SelectRows(
           Table.AddIndexColumn(
               Table.SelectRows(C[Custom], each _[ID] = null),
               "NewID", List.Max(C[Custom][ID]) + 1, 1
           ),
           each _[Index] = C[Index]
       ){0}[NewID]
  else C[ID]

Similar to before, we take the group subtable Custom, just pick the null ID rows and index them starting from the max non-null ID plus one. This still leaves us with a table, so we just want the row in this subtable that corresponds to the Index from the whole table. We use {0}[NewID] to extract the value from the cell in the first (only) row in the table in the [NewID] column. For non-null ID values, the else clause just leaves them as they were.

Power Query: how to add one to a column when a specific values appear in an other column

2 Answers