1
votes

I want to capture a text like this:

{ul}
  {li}Item 1{/li} 
  {li}Item 2{/li} 
  {li}Item 3{/li} 
{/ul}  
{img}this_is_an_image{/img} 
{p}paragraph text {/p} {h2}Heading{/h2}

And turn it into an ArrayList of hashmaps like so:

[
  { "ul" : ["Item 1", "Item 2", "Item 3"] },
  {"img" : "this_is_an_image"}, 
  {"p" : "paragraph text"}, 
  {"h2" : "Heading"}
]

Currently I have a while loop that's able to fetch "base" level items from the string (i.e. not nested items).

ArrayList<Object> list = new ArrayList<>();
Pattern pattern = Pattern.compile("\\{(\\w+)}(?:\\()?([^\\{\\)]+)(?:\\{\\/\1})?");
Matcher matches = pattern.matcher(s);
while (matches.find()) {
    Map<String, String> match = new HashMap<>();
    match.put(matches.group(1), matches.group(2));
    list.add(match);
}
return list;

I would like to modify this to be able to match the first capturing group – capture everything between the opening and closing tag, and then check if there are nested tags within the capturing group 2 – then put them into an array.

So to modify the code something like this:

ArrayList<Object> list = new ArrayList<>();
Pattern pattern = Pattern.compile("New pattern");
Matcher matches = pattern.matcher(s);
while (matches.find()) {
    Map<String, Object> match = new HashMap<>();
    Pattern patt = Pattern.compile("only capture text within brackets pattern")
    Matcher nestedMatches = patt.matcher(matches.group(2))
    ArrayList<String> sublist = new ArrayList<>();
    while(nestedMatches.find()) {
      sublist.add(nestedMatches.group(2))
    }
    if (list.size() > 0) {
       match.put(matches.group(1), sublist);
    } else {
       match.put(matches.group(1), matches.group(2));
    }
    list.add(match);
}
return list;

I have created this regex: \{(\w+)\}(.*)(?:\{\1\})? (obv not java formatted here) but it does not stop at the closing curly brace {/group1}, instead it just continues capturing everything.

I am new to these more complex regex pattern so if anyone could help me out here it would be greatly appreciated – feels like I am close to solving this one.

Here is a Regex 101 showing my issues

1
You may use: (?s)\{(\w+)}(.*?)\{/\1}anubhava

1 Answers

3
votes

You are not far off, you may use this regex:

(?s)\{(\w+)}(.*?)\{/\1}

Updated RegEx Demo

In Java use:

final String regex = "(?s)\\{(\\w+)\\}(.*?)\\{/\\1\\}";

RegEx Details:

  • (?s): End DOTALL mode
  • \{(\w+)}: Match opening tag as{tag}` and capture tag name in capture group #1
  • (.*?): Match 0 more characters (non-greedy) and capture it in group #2
  • \{/\1}: Match closing tag as {/tag} by using back-reference of group #1