1
votes

I would like to ask for your help with the power query-related problem I cannot solve on my own:

One of the columns [address] contains data like this :

    Science Park 404, 1098 XH Amsterdam, Netherlands
    Laan van Malkenschoten 20, 7333 NP Apeldoorn, NL
    plac Europejski 1, 00-844 Warszawa, Poland
    Adrianastraat 6, 3014 XL Rotterdam, Netherlands
    C. Co Brava, 109, 17411 Vidreres, Girona, Spain

I also have a list of keywords, that contain: Amsterdam, Apeldoorn, Rotterdam

What I need is that power query checks if [address] column contains any of the keywords, and if yes, in a new custom column display ‘MATCH’:

    Science Park 404, 1098 XH Amsterdam, Netherlands MATCH
    Laan van Malkenschoten 20, 7333 NP Apeldoorn, NL MATCH
    plac Europejski 1, 00-844 Warszawa, Poland
    Adrianastraat 6, 3014 XL Rotterdam, Netherlands  MATCH
    C. Co Brava, 109, 17411 Vidreres, Girona, Spain

The list of keywords is stored in the same Excel file, in a one-column table called ‘CITIES_NL’

In theory, I can create a conditional column and hard-code the keywords, however I cannot do it for two reasons:

  • the list needs to be dynamic, it is likely that new keywords will be added on a daily basis
  • in production, the list of keywords would consist of 75k items, it is impossible to enter this to PQ

To achieve the same in Excel, without PQ, I would use this formula:

    =IF(--SUMPRODUCT(--ISNUMBER(SEARCH(CITIES_NL,address)))>0,"MATCH","")

however, this has to be done by PQ, as this is only one of the steps for preparing my data

I would appreciate your help!

Michal

2

2 Answers

1
votes

You can do this dynamically by adding a custom column that for each address checks each keyword to see if it is a substring and counts the number of keywords that are a substring of the address. If the count > 0, then return "MATCH".

Here's one way to do this:

= Table.AddColumn(#"Previous Step Reference", "Custom Column Name",
      (C) => if List.Count(
                    List.Select(CITIES_NL[keyword], each Text.Contains(C[address], _))
                ) > 0
             then "MATCH"
             else null,
      type text)

The syntax is a bit different but the logic is very similar to your Excel formula.

Note that I defined a context C so that I could reference the address from that row context within the context of the CITIES_NL[keyword] list.

1
votes

Firstly, add both your ADDRESS and CITIES_NL table to power query editor, convert CITIES_NL into a List, then add a custom column to the ADDRESS table using the following formula:

=if List.Count(Splitter.SplitTextByAnyDelimiter(CITIES_NL)([Column1]))>1 then "MATCH" else null

The logic is to split ADDRESS by any words contained in the CITIES_NL list, and count how many sub-strings are there. If the result is greater than 1 which means at least one key word was found, then use IF to return MATCH otherwise (null).

Replace [Column1] with the actual column name in your case.

Here is the full Power Query M code for your reference:

let
    Source = Excel.CurrentWorkbook(){[Name="ADDRESS"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if List.Count(Splitter.SplitTextByAnyDelimiter(CITIES_NL)([Column1]))>1 then "MATCH" else "")
in
    #"Added Custom"

Please note the match will be case sensitive. If you want a case-insensitive match, you can add an index column to the ADDRESS table, make a duplicate of the ADDRESS query, convert both ADDRESS and CITIES_NL to lowercase then do the match, and then merge the matched query with the original ADDRESS query by the index number.

Let me know if you have any questions. Cheers :)