DataWeave 2.0 Match all occurrences of regex

Question

I would like to capture all occurrences within a string that match a specific regular expression. I'm using DataWeave 2.0 (which means Mule Runtime 4.3 and, in my case Anypoint Studio 7.5)

I've tried to use scan() and match() from the DataWeave core library, but I can't quite get the result I want.

Here's some of the things I've tried:

%dw 2.0
output application/json

// sample input with hashtag keywords
var microList = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    withscan: microList scan /(#[^\s]*).*/,
    sanitized: microList replace /\n/ 
        with ' ',
    sani_match: microList replace /\n/ 
        with ' ' match /.*(#[^\s]*).*/, // gives full string and last match
    sani_scan: microList replace /\n/ 
        with ' ' scan /.*(#[^\s]*).*/   // gives array of arrays, string and last match
}

Here are the respective results:

{
  "withscan": [
    [
      "#downtownmalls now!",
      "#downtownmalls"
    ],
    [
      "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#shoplocal"
    ]
  ],
  "sanitized": "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
  "sani_match": [
    "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "sani_scan": [
    [
      "Someone is giving away millions. See @realmcsrooge at #downtownmalls now! #shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
      "#downtowndancehalls"
    ]
  ]
}

In the first example, it appears that the parser is doing line processing. So there is one element in the result array for each line. An element consists of the full matched portion and the tagged portion using the first occurrence of the pattern.

After stripping newlines, the third example (sani_match) gave me an array with the fully matched portion and the tagged portion, this time the last occurrence of the pattern on the line.

The final pattern (sani_scan) gives similar results, the only difference being that the result is embedded as an element in array of arrays.

What I want is simply an array with all occurrences of a specified pattern.

Jorge Garcia Jorge Garcia · Accepted Answer · 2020-05-19T05:01:39

If you want to capture all occurrences within a string that match a specific regular expression, I found that the magic words are "Overlapping Matches".

If what you really want is to get the hashed tags from the string, just use Valdi_Bo solution

To enable single-line flag in Java, you need to add (?s) at the beginning.

script:

%dw 2.0
output application/json

var str = 'Someone is giving away millions. See @realmcsrooge at #downtownmalls now!
#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls'
---
{
    // (?s) is the single-line modifier
    // (?=(X)). enable overlapping matches
    matchUntilEnd: str scan(/(?s)(?=(#([^\s]*).*))./) map $[1],
    justTags: str scan(/(?s)#([^\s]*)/) map $[1],
    Valdi_BoSolutionWithGroups: str scan(/#([\S]+)/) map $[1]
}

output:

{
  "matchUntilEnd": [
    "#downtownmalls now!\n#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#shoplocal and tell them #giveaway @barry sent you. #downtowndancehalls",
    "#giveaway @barry sent you. #downtowndancehalls",
    "#downtowndancehalls"
  ],
  "justTags": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ],
  "Valdi_BoSolutionWithGroups": [
    "downtownmalls",
    "shoplocal",
    "giveaway",
    "downtowndancehalls"
  ]
}

DataWeave 2.0 Match all occurrences of regex

2 Answers