1
votes

We have a similar mapping to this one:

PUT my_index
{
    "mappings": {
        "_doc": {
            "properties": {
                "tags": {
                    "type":  "keyword"
                }
            }
        }
    }
}

And documents like this... 1)TERM1-TERM2-TERM4-TERM3 2)TERM1-TERM2-TERM5-TERM3

Using an expression like this

GET /my_index/_doc/_search
{
    "query": {
        "regexp": {
            "tag": "TERM1.*TERM3" 
        }
    }
}

I am able to match with the documents since I am matching the whole keyword with the regex. But the matching that I really need is something like TERM2-*-TERM3, where * matches only with a WORD, and not with many words. Is it possible to achieve what I like? Another expression that I would like to write is TERM1---TERM3 Matching both documents too.

Thanks

1
Try TERM1-(.*[^A-Za-z0-9_])?TERM2([^A-Za-z0-9_].*)?-TERM3. The second one should be TERM1-[^-]*-[^-]*-TERM3Wiktor Stribiżew

1 Answers

0
votes

To match a document starting with TERM1 and ending with TERM3 having as whole word WORD anywhere in between, you may use

TERM1-(.*[^A-Za-z0-9_])?WORD([^A-Za-z0-9_].*)?-TERM3

See the regex demo.

Details

  • TERM1- - TERM1- at the start of the string
  • (.*[^A-Za-z0-9_])?- an optional sequence of any 0+ chars other than newline as many as possible and then any non-word char
  • WORD - a literal WORD
  • ([^A-Za-z0-9_].*)? - an optional sequence of any non-word char and then any 0+ chars other than newline as many as possible
  • -TERM3 - -TERM3 at the end of the string.

To solve the second issue, you may just use negated bracket expressions:

TERM1-[^-]*-[^-]*-TERM3

where [^-]* matches any 0+ chars other than -. See another regex demo.

NOTE: in the demos, I am using ^ and $ to make the patterns match the whole lines (with m modifier). Do not use them in ES as matching is anchored at string boundaries implicitly.