Now I bumped into other problem - how can I choose only the values of the field which fit fuzzy query? Let's say there are different names in the field university like: education : [MIT, Stanford University, Michingan university] but I want to select only stanford university. Let's say I can do aggregation on each fuzzy query, which would return ALL counts and all names of universities from field education. What I need - to get aggregations only of exact values which match fuzzy query. Let's say if I do a fuzzy query for Stanford University and a field education holds values of [MIT, Stanfordddd University, Michigan University], I would like a query to bring me back only a value of 'Stanfordddd University', not all three of them. Thanks!
0
votes
1 Answers
0
votes
For this feature, your field education must be of type nested and you make use of inner_hits feature to retrieve the only concerned value.
Below is the sample mapping as how your field education would be in this case:
Mapping:
PUT my_index
{
"mappings":{
"mydocs":{
"properties":{
"education": {
"type": "nested"
}
}
}
}
}
Sample Documents:
POST my_index/mydocs/1
{
"education": [
{
"value": "Stanford University"
},
{
"value": "Harvard University"
}]
}
POST my_index/mydocs/2
{
"education": [
{
"value": "Stanford University"
},
{
"value": "Princeton University"
}]
}
Fuzzy Query on Nested Field:
POST my_index/_search
{
"query":{
"nested":{
"path":"name",
"query":{
"bool":{
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"name.value":{
"value":"Stanford",
"fuzziness":2
}
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"name.value":{
"value":"University",
"fuzziness":2
}
}
}
}
}
],
"slop":0,
"in_order":false
}
}
]
}
},
"inner_hits":{}
}
}
}
Sample Response:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6931472,
"hits": [
{
"_index": "my_index",
"_type": "mydocs",
"_id": "2",
"_score": 0.6931472,
"_source": {
"education": [
{
"value": "Stanford University"
},
{
"value": "Princeton University"
}
]
},
"inner_hits": {
"name": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "my_index",
"_type": "mydocs",
"_id": "2",
"_nested": {
"field": "education",
"offset": 0
},
"_score": 0.6931472,
"_source": {
"value": "Stanford University"
}
}
]
}
}
}
},
{
"_index": "my_index",
"_type": "mydocs",
"_id": "1",
"_score": 0.6931472,
"_source": {
"education": [
{
"value": "Stanford University"
},
{
"value": "Harvard University"
}
]
},
"inner_hits": {
"name": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "my_index",
"_type": "mydocs",
"_id": "1",
"_nested": {
"field": "education",
"offset": 0
},
"_score": 0.6931472,
"_source": {
"value": "Stanford University"
}
}
]
}
}
}
}
]
}
}
Notice the section inner_hits where you'd see that only the relevant/concerned document having Stanford University would be returned.
Elasticsearch by default returns the entire document as response. To certain extent you can perform filtering based on fields using _source, however it doesn't allow you to filter values.
Hope this helps!
fuzzythat you used, bears the description "Do not use". Please do not use tags that have "Do not use" in their description. Thanks in advance. - Jongware