0
votes

As Power Query does not provide support for RegEx how can I extract a number of 8 digits from a string.

The number can appear anywhere in the string.

I have tried to use M but didn't find any complete solution. I also tried the option "Column From Examples" but the complexity of the cases does not let me have a reliable result either.

I googled it but it seems there are no examples suitable for PowerQuery/M

abc123456 -->null

abc12345678 -->12345678

abc 12345678 -->12345678

abc 12345678aaaa -->12345678

abc 1111 ddd12345678aaaa 3467-->12345678

abc 1111 ddd123456aaaa 3467-->null

etc...

EDIT

I have added the following code as suggested by Alexis Olson

= Table.AddColumn(
    Sheet1_Sheet,
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[String]) - 8},
                    each Text.Range(r[String], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

the code differs only for Sheet1_Sheet instead of #"Previous Step" because this is the field holding the data but I get the following error message "Expression.Error: The field 'String' of the record wasn't found. Details: Column1=abc123456" What am I doing wrong ?

EDIT2

I have found the error in my formula. Here the correct one

Table.AddColumn(
    #"Renamed Columns",
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[Column]) - 8},
                    each Text.Range(r[Column], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

I had simply to replace from Alexis Olson's answer:

  • #"Previous Step" with #"Renamed Columns" because that is the previous step name in the "Applied Steps" section of the "Query Settings" in PQ
  • r[String] with r[Column] because "Column" is the name of the column containing the data I want to find the pattern in my PQ
1

1 Answers

3
votes

It's not simple, but you could add a custom column that does this:

Table.AddColumn(
    #"Previous Step",
    "PatternMatched",
    (r) =>
        List.First(
            List.Select(
                List.Transform(
                    {0..Text.Length(r[String]) - 8},
                    each Text.Range(r[String], _, 8)
                ),
                each _ = Text.Select(_, {"0".."9"})
            )
        )
)

Let's take a look at what's happening here with the example r[String] = abc12345678.

  1. Since Text.Length(r[String]) = 11, we can look at substring ranges of length 8 starting at index 0 through index 11 - 8 = 3.
  2. So List.Transform({0,1,2,3}, each Text.Range("abc12345678", _, 8)) transforms the list {0,1,2,3} into a list of 8-long substring ranges: {"abc12345", "bc123456", "c1234567", "12345678"}.
  3. Now for each of these substrings, check if it consists of only digits by comparing each string to a version of itself containing only digits using Text.Select(_, {"0".."9"}) to strip all but the digit characters. Then List.Select each substring where this condition is true.
  4. The result is a list of substrings that are a length of 8 and contain only digits (empty if none exist). Use List.First to return the first string from this list.

Result Screenshot