Lately i have been trying to apply facet to a field with some values having multiple words(a phrase)? I have been suggested to use shingles but am not sure if that would work as expected as the required phrase should be taken from a given list.
For example: when i apply facet to a field, i get seperate facets for 'Information' and 'Technology' whereas i want it to be a single facet like 'Information Technology'.
How to facet a particular phrase in a particular field?
EDIT: The schema for the required field looks like this:
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
possible with WordDelimiterFilter in conjuncton with stemming. -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
The shingles filter doesn't work, as it shows three facets for Information technology: information, technology and information technology.
outputUnigrams="true"
if you don't want to use unigrams? – soulcheck