Given a phrase match query like this:
{
'match_phrase': {
'text.english': {
'query': "The fox jumped over the wall",
'phrase_slop': 4,
}
}
}
Is there a way I can group results by the exact match?
So if I have 1 document with text.english
containing "The quick fox jumps over the small wall" and 3 documents containing "The lazy fox jumped over the big wall", I end up with those two groups of results.
I'm OK with running multiple queries and doing some processing outside of ES, but I need a solution that performs reasonably over a large set of documents. Ideally I'm hoping there's a way to do this using aggregations that I've missed.
The best solution I've come up with is to run the query above with highlights, parse out all of the highlights from all of the results, and group them based on highlight content. This is fine for very small result sets, however over a 1000+ document result set it is prohibitively slow.
EDIT: Maybe I can make this a bit clearer. If I have sample documents with the following values:
- "The quick fox jumps over the small wall. Blah blah blah many pages of unrelated text."
- "The lazy fox jumped over the big wall. Blah blah blah many pages of unrelated text."
- "The lazy fox jumped over the big wall. Blah blah blah many pages of unrelated text."
- "The lazy fox jumped over the big wall. Blah blah blah many pages of unrelated text."
I want to be able to group my results as follows with query text "The fox jumped over the wall":
- "The quick fox jumps over the small wall" - Document 1
- "The lazy fox jumped over the big wall" - Documents 2, 3, 4
text.english.raw
should do it (where.raw
is anot_analyzed
subfield). – Andrei Stefan"The lazy fox jumped over the big wall"
this is the text that was indexed initially. Do you want to group based on this text or on something else? What if your text has 5 lines, do you want to group on this entire text? – Andrei Stefan