We are using hibernate search orm 5.9.2 and would like to achieve the exact search results like:
If user starts with
John -> all data with John should display
John Murphy -> all data with John murphy should display
John murphy Columbia -> Only data with John murphy Columbia should display
John murphy Columbia SC -> Only data with John murphy Columbia should display
John murphy Columbia SC 29201 -> Only data with John murphy Columbia SC 29201
29201 -> Only data with 29201 as zipcode should be displayed.
and so on...
Basically we are trying to achieve search on exact records from multiple fields on index.
We have entity containing this data in fields like Name, Address1, address2, city, zipcode, state.
We have tried bool()
(with should/must) queries, but as we are not sure what data will user enter first, it could be zipcode, state, city any where in the text search.
Please share your knowledge/logic with regards to analyzers/strategy which we can use to accomplish this with hibernate search/lucene.
Below is the index structure:
> {
> "_index" : "client_master_index_0300",
> "_type" : "com.csc.pt.svc.data.to.Basclt0300TO",
> "_id" : "518,1",
> "_score" : 4.0615783,
> "_source" : {
> "id" : "518,1",
> "cltseqnum" : 518,
> "addrseqnum" : "1",
> "addrln1" : "Dba",
> "addrln2" : "Betsy Evans",
> "city" : "SDA",
> "state" : "SC",
> "zipcode" : "89756-4531",
> "country" : "USA",
> "basclt0100to" : {
> "cltseqnum" : 518,
> "clientname" : "Betsy Evans",
> "longname" : "Betsy Evans",
> "id" : "518"
> },
> "basclt0900to" : {
> "cltseqnum" : 518,
> "id" : "518"
> }
> }
> }
Below is the input
Akash Agrawal 29021
the response contains all records matching akash, agrwal, 29,2, 1, 01 etc...
What we are trying to achieve is the exact search result, with respect to above search input the results should only contain data with Akash Agrawal 29201 and not other data.
We are basically searching on basclt0100to.longname, addrln1, addrln2, city, state, zipcode, country.
The index definition is below
> {
> "client_master_index_0300" : {
> "aliases" : { },
> "mappings" : {
> "com.csc.pt.svc.data.to.Basclt0300TO" : {
> "dynamic" : "strict",
> "properties" : {
> "addrln1" : {
> "type" : "text",
> "store" : true
> },
> "addrln2" : {
> "type" : "text",
> "store" : true
> },
> "addrln3" : {
> "type" : "text",
> "store" : true
> },
> "addrseqnum" : {
> "type" : "text",
> "store" : true
> },
> "basclt0100to" : {
> "properties" : {
> "clientname" : {
> "type" : "text",
> "store" : true
> },
> "cltseqnum" : {
> "type" : "long",
> "store" : true
> },
> "firstname" : {
> "type" : "text",
> "store" : true
> },
> "id" : {
> "type" : "keyword",
> "store" : true,
> "norms" : true
> },
> "longname" : {
> "type" : "text",
> "store" : true
> },
> "midname" : {
> "type" : "text",
> "store" : true
> }
> }
> },
> "basclt0900to" : {
> "properties" : {
> "cltseqnum" : {
> "type" : "long",
> "store" : true
> },
> "email1" : {
> "type" : "text",
> "store" : true
> },
> "id" : {
> "type" : "keyword",
> "store" : true,
> "norms" : true
> }
> }
> },
> "city" : {
> "type" : "text",
> "store" : true
> },
> "cltseqnum" : {
> "type" : "long",
> "store" : true
> },
> "country" : {
> "type" : "text",
> "store" : true
> },
> "id" : {
> "type" : "keyword",
> "store" : true
> },
> "state" : {
> "type" : "text",
> "store" : true
> },
> "zipcode" : {
> "type" : "text",
> "store" : true
> }
> }
> }
> },
> "settings" : {
> "index" : {
> "creation_date" : "1535607176216",
> "number_of_shards" : "5",
> "number_of_replicas" : "1",
> "uuid" : "x4R71LNCTBSyO9Taf8siOw",
> "version" : {
> "created" : "6030299"
> },
> "provided_name" : "client_master_index_0300"
> }
> }
> }
> }
I've till now tried using edgengraanalyzer, standard analyzer of lucene query. I've tried with Bool() query, keyword query, phrase, tried all that is available under documentation.
But I'm sure I'm missing the strategy/logic which we should use.
Below is the current query I'm using and is giving the attached snapshot results
Query finalQuery = queryBuilder.simpleQueryString()
.onFields("basclt0100to.longname", "addrln1", "addrln2"
,"city","state","zipcode", "country")
.withAndAsDefaultOperator()
.matching(lowerCasedSearchTerm)
.createQuery();
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(finalQuery, Basclt0300TO.class);
fullTextQuery.setMaxResults(this.data.getPageSize()).setFirstResult(this.data.getPageSize());
List<String> projectedFields = new ArrayList<String>();
for (String fieldName : projections)
projectedFields.add(fieldName);
@SuppressWarnings("unchecked")
List<Cltj001ElasticSearchResponseTO> results = fullTextQuery.
setProjection(projectedFields.toArray(new String[projectedFields.size()]))
.setResultTransformer( new BasicTransformerAdapter() {
@Override
public Cltj001ElasticSearchResponseTO transformTuple(Object[] tuple, String[] aliases) {
return new Cltj001ElasticSearchResponseTO((String) tuple[0], (long) tuple[1],
(String) tuple[2], (String) tuple[3], (String) tuple[4],
(String) tuple[5],(String) tuple[6], (String) tuple[7], (String) tuple[8]);
}
})
.getResultList();
resultsClt0300MasterIndexList = results;
searched for: akash 29201 & searched for : akash 1 main
Here you can see we have all the data containing akash , sh, 29, 292, 29201.
Expected results:
Akash Agrawal - 29201 Akash Agrawal - 1 main street, SC , 29201
Basically only exact data containing/matching the input string.
Analyzers used: Index time
@AnalyzerDef(name = "autocompleteEdgeAnalyzer",
//Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class),
@TokenFilterDef(
factory = EdgeNGramFilterFactory.class, // Generate prefix tokens
params = {
@Parameter(name = "minGramSize", value = "3"),
@Parameter(name = "maxGramSize", value = "3")
}
)
})
Query time overriding with:
@AnalyzerDef(name = "withoutEdgeAnalyzerFactory",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
}
/*filters = {
// Normalize token text to lowercase, as the user is unlikely to
// care about casing when searching for matches
@TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
@Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
@Parameter(name = "replacement", value = " "),
@Parameter(name = "replace", value = "all") }),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class) }*/)
Hope these details help.