0
votes

Hi I'm stuck with an issue: I have a field splited_data and field type text_split (in my schema.xml):

<field name="splited_data" type="text_split" indexed="true" stored="false"  />
<fieldType name="text_split" class="solr.TextField" autoGeneratePhraseQueries="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
                        words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1"
    catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
    preserveOriginal="1" splitOnNumerics="1" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.KStemFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
                        ignoreCase="true" expand="true" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
                        words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.KStemFilterFactory" />
         </analyzer>
</fieldType>

Now when I'm indexing this field splited_data with value "Layer Hybrid Case Black iPhone 5C" After indexing when i'm trying different queries (with simple lucene parser) this is the result:

  1. q=splited_data:"iphone 5c" => 1 result is found . Desired result
  2. q=splited_data:"black iphone 5c" => no result is found. (Not desired)

This is something to do with caps in iPhone but I am not sure what. Please help. I'm using lucene 4.3. Let me know if I need to tell any other info too.

Update: I got the problem. But not sure how to handle it. The problem is position of tokens being generated from wordDelimiterFactory:

black -position: 4
iphone -position: 5
i -position:5
phone -position:6
5c -position:7

so when I'm Searching for black iphone 5c it finds black at 4 iphone at 5 and nothing to match at position 6. Ideally instead of 6 it should be matching directly position 7 for 5c. Is there anyway to specify this in phrase query?

1

1 Answers

0
votes

Your field "text_split" looks good to me on the Solr admin Analysis tool. Your query gets reduced to "black, iphone, 5c". The document gets indexed to '"layer, layer, hybrid, case, black, i, phone, 5c", 5, c, 5c". The query for "black iphone 5c" matches the document. I'm using solr version 4.7.0_01 on tomcat7.