10
votes

I've been trying to figure out how to do a recursive regular expression in Perl 6. For a toy example, a balanced parentheses matcher, which would match ((())()) inside (((((())()).

  • PCRE example: /\((?R)?\)/

  • Onigmo example: (?<paren>\(\g<paren>*\))

I thought this would do it:

my regex paren {
  '(' ~ ')' <paren>*
}

or the simpler

my regex paren {
  '(' <paren>* ')'
}

but that fails with

No such method 'paren' for invocant of type 'Match'
in regex paren at ...
2
@HåkonHægland: Thanks, especially that link was a nice find. However, I was explicitly trying not to look at grammars since I want to find all matching spans, not parse a string from start, and I don't think grammars support that. That said, I am a noob at P6, so I am sure I'm missing something.Amadan
@HåkonHægland I mean I guess I can make a grammar that has nonparen as stuff I don't want, and an action class that will collect the paren matches... but that gets complicated fast... It's just very hard to believe P6 regular expressions dropped support for something Perl basically pioneered.Amadan

2 Answers

15
votes

You need to make explicit that you're calling a my-scoped regex:

my regex paren {
    '(' ~ ')' <&paren>*
}

Notice the & that has been added. With that:

say "(()())" ~~ /^<&paren>$/    # 「(()())」
say "(()()" ~~ /^<&paren>$/     # Nil

While it's true that you can sometimes get away without explicitly writing the &, and indeed could when using it:

say "(()())" ~~ /^<paren>$/    # 「(()())」
say "(()()" ~~ /^<paren>$/     # Nil

This only works because the compiler spots there is a regex defined in the lexical scope with the name paren so compiles the <paren> syntax into that. With the recursive case, the declaration isn't installed until after the regex is parsed, so one needs to be explicit.

2
votes

You can use the ~~ in the meta-syntax to make a recursive callback into the current pattern or just a part of it. For example, you can match balanced parenthesis with the simple regex:

say "(()())" ~~ /'(' <~~>* ')'/;    # 「(()())」
say "(()()"  ~~ /'(' <~~>* ')'/;    # 「()」

Try it online!

Unfortunately, matching via a captured subrule (like ~~0) is not yet implemented.