0
votes

How can we sort tokens before indexing to elastic search. for Eg. i want to index

 "a b" => "ab" 
 "b a" => "ab"
 "java language" => "javalanguage"
 "requirement analysis" => "analysisrequirement"

After sorting we are concatenating all tokens for our use case.

How can we achieve this using custom sort analyser?.

EDIT: so we have applied couple of custom analyser on Elastic search mapping for our use case. For eg. we have

token
stemming
custom_words_concatenation

I want to sorting of words using analyser. like below,

token
stemming
sort
custom_words_concatenation
1
Are those tokens simple letters or words?Val
the information you provided is not sufficient to understand your usecase. Ex: you can have "boy apple" => "appleboy", where you may want to concatenate based on alphabet order of each token. Add/explain all usecasesuser3775217
@val sorting on words...after tokenizationranjeet

1 Answers

0
votes

Created a custom sort analyzer.

link github https://github.com/ranjeet-floyd/plugin-sortchar.git

Convert input string to char[] and sort using Arrays.sort .

For eg:

requirement analysis  =>  aaeeeiilmnnqrrsstuy
analysis requirement => aaeeeiilmnnqrrsstuy