11
votes

I'm trying to write a regex that can extract a command, here's what I've got so far using a negative lookbehind assertion:

\b(?<![@#\/])\w.*

So with the input:

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

foo bar baz is extracted every time. See working example https://regex101.com/r/lF9aG7/3

In Go however this doesn't compile http://play.golang.org/p/gkkVZgScS_

It throws:

panic: regexp: Compile(`\b(?<![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`

I did a bit of research and realized negative lookbehinds are not supported in the language to guarantee O(n) time.

How can I rewrite this regex so that it does the same without negative lookbehind?

2
How about (?:^|[^@#/])\b(\w.*) ?Mariano
Can you give an example which you want as an output?Mayur Koshti
What should it do in the case "foo \msg some stuff @name other stuff #chan blah blah"?Amit Kumar Gupta
@MayurKoshti the playground link shows the expected output, given various inputs.Amit Kumar Gupta
I've updated my reply below to make use of string filtering instead of regex. This'll of course, only work if you're looking to filter out all words beginning with a character from [#@/].hjpotter92

2 Answers

3
votes

Since in your negated lookbehind, you are only using a simple character set; you can replace it with a negated character-set:

\b[^@#/]\w.*

If the are allowed at the start of the string, then use the ^ anchor:

(?:^|[^@#\/])\b\w.*

Based on the samples in Go playground link in your question, I think you're looking to filter out all words beginning with a character from [#@/]. You can use a filter function:

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

and a Process function, which makes use of the filter above:

func Process(inp string) string {
    t := strings.Split(inp, " ")
    t = Filter(t, func(x string) bool {
        return strings.Index(x, "#") != 0 &&
            strings.Index(x, "@") != 0 &&
            strings.Index(x, "/") != 0
    })
    return strings.Join(t, " ")
}

It can be seen in action on playground at http://play.golang.org/p/ntJRNxJTxo

2
votes

You can actually match the preceding character (or the beginning of line) and use a group to get the desired text in a subexpression.

Regex

(?:^|[^@#/])\b(\w+)
  • (?:^|[^@#/]) Matches either ^ the beginning of line or [^@#/] any character except @#/
  • \b A word boundary to assert the beginning of a word
  • (\w+) Generates a subexpression
    • and matches \w+ any number of word characters

Code

cmds := []string{
    `/msg @nickname #channel foo bar baz`,
    `#channel @nickname foo bar baz /foo`,
    `foo bar baz @nickname #channel`,
    `foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// Loop all cmds
for _, cmd := range cmds{
    // Find all matches and subexpressions
    matches := regex.FindAllStringSubmatch(cmd, -1)

    fmt.Printf("`%v` \t==>\n", cmd)

    // Loop all matches
    for n, match := range matches {
        // match[1] holds the text matched by the first subexpression (1st set of parentheses)
        fmt.Printf("\t%v. `%v`\n", n, match[1])
    }
}

Output

`/msg @nickname #channel foo bar baz`   ==>
    0. `foo`
    1. `bar`
    2. `baz`
`#channel @nickname foo bar baz /foo`   ==>
    0. `foo`
    1. `bar`
    2. `baz`
`foo bar baz @nickname #channel`    ==>
    0. `foo`
    1. `bar`
    2. `baz`
`foo bar baz#channel`   ==>
    0. `foo`
    1. `bar`
    2. `baz`

Playground
http://play.golang.org/p/AaX9Cg-7Vx