1
votes

I'm trying to perform (what I consider to be) a simple RegExp query in ElasticSearch, but for the life of me I can't get it to work. Here is said query,

{
  "query": {
    "regexp":{
      "request_url": "/somepage/[0-9]+/[a-z]+"
    }
  }
}

Where this query seems to break is when I include a forward slash / character. Without the forward slashes, I get results for each of the individual parts.

I've tried just a slash /, I've tried escaping it with \/ and \\/. I reduced the query to /somepage and \\/somepage, nothing.

I've been Googling heavily on this, and I can't seem to find a consistent answer. I think maybe ElasticSearch just doesn't support / in RegExp queries...? I know from looking at the docs that it isn't a reserved character.

Could it be something to do with the datatype of this field?

"request_url": {
  "type": "text",
  "norms": false,
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
},

Looking at the docs, text explicitly says it analyses and tokenises the data. So I expect it may be dropping the / characters? I tried the query against the keyword field as well, and it didn't work.

So, any suggestions? Can ElasticSearch actually do what I need? Thanks.

1

1 Answers

0
votes

My suggestions:

  1. \\\\/ if that was treated as a string it would become \\/ in JSON, that at the same time, would become \/
  2. \x2f and derivatives => \\x2f or \\\\x2f