I'm preparing an in-site search engine with elasticsearch and I'm new to elasticsearch. Sites which will use this engine are Turkish / English.
In Turkey, we have Turkish letters like 'ğ', 'ü', 'ş', 'ı', 'ö', 'ç'. But when we search generally we use the letters 'g', 'u', 's', 'i', 'o', 'c'. This is not a rule but we generally do it, think like a habit, something we used to.
Now, I have a document type called "product" and this type has several string properties and some are nested. For example:
public class Product {
public string ProductName { get; set; }
public Category Category { get; set; }
//...
}
public class Category {
public string CategoryName { get; set; }
//...
}
My goal is this:
- ProductName or Category.CategoryName may contain Turkish letters ("Eşarp") or some may be mistyped and written with English letters ("Esarp")
- Querystring may contain Turkish letters ("eşarp") or not ("esarp")
- Querystring may have multiple words
- Every indexed string field should be searched against querystring (full-text search)
Now, what I did:
- While creating index, I also configure mappings and used a custom analyzer called "sanalyze" which uses "lowercase" and "asciifolding" filters and standard tokenizer instead of standard analyzer.
- Used that custom analyzer for string fields mappings.
Example code for mapping:
// some more mappings which uses the same mapping for all string fields.
.Map<Yaziylabir.Extensions.TagManagement.Models.TagModel>(m => m.AutoMap().Properties(p => p
.String(s => s
.Name(n => n.Tag).Analyzer("sanalyze")))))
.Settings(s => s
.Analysis(ans => ans
.Analyzers(anl => anl
.Custom("sanalyze", c => c
.Tokenizer("standard")
.Filters("lowercase", "asciifolding")))));
- I deleted, recreated and indexed my index
- Now I'm trying to search in that index.
I tried with two different query to search against stored documents:
q &= Query<ProductModel>.QueryString(t => t.Query(Keyword).Analyzer("sanalyze"));
q &= Query<ProductModel>.QueryString(t => t.Query(Keyword));
The second doesn't use Analyzer method because in elasticsearch documentation, it says that elasticsearch will use the analyzer used on a field. So I think there is no need to define it again while searching.
What I got as result:
- First query (with Analyzer("sanalyze")): When I search "eşarp" or "esarp", No results. When I search "bordo", I got results.
- Second query (without analyzer("sanalyze")): When I search "eşarp", I got results. When I search "esarp", No results. When I search "bordo", I got results.
BTW:
Documents contain "Eşarp" as ProductName value and when I checked elasticsearch created "esarp" field term.
Documents contain "Bordo" as value and "bordo" as field term.
I couldn't achive what I want. What do I do wrong? - Should I use another filter instead of asciifolding? - Should I use preserveOriginal with asciifolding? I don't want to use that option to not to screw scores. - Something different to do?
Can you please help me?
If you think it is not clear what I'm asking, please tell me, I will try to make it clearer.
Thank you.