I have been trying to figure out the best way to use actual regex patterns within an Elasticsearch 5.4 query. After searching about the standard analyzer and tokenizing each string field, I started using the not analyzed field placed in my mappings (standard .raw property). I have tried two variants of the same query, neither has been successful.
Query String filter:
GET /test-*/_search
{
"query": {
"bool": {
"must": [
{
"query_string":{
"query": "URL.raw:/^(http|https)\\:\/\/.+(wp-content|wp-admin)/"
}
}
]
}
},
"sort": {
"@timestamp": {
"order": "desc"
}
}
}
REGEXP FILTER:
GET /test-*/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"URL.raw":{
"value": "/^(http|https)\\:\/\/.+(wp-content|wp-admin)/"
}
}
}
]
}
},
"sort": {
"@timestamp": {
"order": "desc"
}
}
}
Both seem to yield no results or parse exceptions
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "parse_exception: Encountered \" \"^\" \"^ \"\" at line 1, column 8.\nWas expecting one of:\n <BAREOPER> ...\n \"(\" ...\n \"*\" ...\n <QUOTED> ...\n <TERM> ...\n <PREFIXTERM> ...\n <WILDTERM> ...\n <REGEXPTERM> ...\n \"[\" ...\n \"{\" ...\n <NUMBER> ...\n "
},
Does lucene require special escaping or blacklisted chars? Any help or pointers would be much appreciated. Thanks!
^
/$
are not special there. You do not need/
regex delimiters and you do not need to escape/
. Try theregexp_filter
with"https?://.*wp-(content|admin).*"
- Wiktor Stribiżew