I needed partial search in my website. Initially I used edgeNgramFeild directly it didn't work as expected. So I used custom search engine with custom analyzers.I am using Django-haystack.
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Used edgengram_analyzer
for indexing and suggest_analyzer
for search. This worked for some extent. But,it doesn't work for numbers for example when 30 is entered it doesn't search for 303 and also with words containing alphabet and numbers combined. So I searched for various sites.
They suggested to use standard or whitespace
tokenizer and with haystack_edgengram
filter. But it didn't work at all, putting aside number partial search didn't work even for alphabet. The settings after the suggestion:
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "whitepsace",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Does anything other than lowercase
tokenizer work with django-haystack? or haystack_edgengram
filter not working for me. According my knowledge it should work like this. Considering 2 Lazy Dog
as text supplied. it should get tokens like this with whitespace
[2,Lazy,Dog]
. and then applying haystack_edgengram
filter it should generate tokens [2,la,laz,lazy,do,dog]
.its not working like this.Did i do something wrong?
My requirement is for example for text 2 Lazy Dog
when some one types 2 Laz
it should work.
Edited:
In my assumption the lowercase tokenizer worked properly. But, in case of above text it will omit 2
and creates token [lazy,dog]
. Why can't standard or whitespace tokenizer work?