I needed partial search in my website. Initially I used edgeNgramFeild directly it didn't work as expected. So I used custom search engine with custom analyzers.I am using Django-haystack.
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Used edgengram_analyzer for indexing and suggest_analyzer for search. This worked for some extent. But,it doesn't work for numbers for example when 30 is entered it doesn't search for 303 and also with words containing alphabet and numbers combined. So I searched for various sites.
They suggested to use standard or whitespace tokenizer and with haystack_edgengram filter. But it didn't work at all, putting aside number partial search didn't work even for alphabet. The settings after the suggestion:
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "whitepsace",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Does anything other than lowercase tokenizer work with django-haystack? or haystack_edgengram filter not working for me. According my knowledge it should work like this. Considering 2 Lazy Dog as text supplied. it should get tokens like this with whitespace [2,Lazy,Dog]. and then applying haystack_edgengram filter it should generate tokens [2,la,laz,lazy,do,dog] .its not working like this.Did i do something wrong?
My requirement is for example for text 2 Lazy Dog when some one types 2 Laz it should work.
Edited:
In my assumption the lowercase tokenizer worked properly. But, in case of above text it will omit 2 and creates token [lazy,dog]. Why can't standard or whitespace tokenizer work?