4
votes

We are replacing the search and indexing module in an application from DtSearch to Solr using solrnet as the .net Solr client library.

We are relatively new to Solr/Lucene and would need some help/direction to understand the more advanced search options in Solr.

The current application supports the following search options using DtSearch:

1)Word(s) or phrase

2)Exact words or phrases

3)Not these words or phrases

4)One or more of words("A" OR "B" OR "C")

5)Proximity of word with n words of another word

6)Numeric range - From - To

7)Option

. Stemming(search* finds searching or searches)

. Synonym(search& finds seek or look)

. Fuzzy within n letters(p%arts finds paris)

. Phonic homonyms(#Smith also finds Smithe and Smythe)

As an example the search query that gets generated to be posted to DtSearch for the below use case:

  1. Search Phrase: generic collection

  2. Exact Phrase: linq

  3. Not these words: sql

  4. One or more of these words: ICollection or ArrayList or Hashtable

  5. Proximity: csharp within 4 words of language

  6. Options:

    a. Stemming

    b. Synonym

    c. Fuzzy within 2 letters

    d. Phonic homonyms

    Search Query: generic* collection* generic& collection& #generic #collection g%%eneric c%%ollection "linq" -sql ICollection OR ArrayList OR Hashtable csharp w/4 language

We have been able to do simple searches(singular term search in a file content) with highlights with Solr. Now we need to replace these options with Solr/Lucene.

Can anybody provide some directions on what/where should we be looking.

1

1 Answers

3
votes
  1. Word(s) or phrase
    Solr provides support to query over fields and across fields with variable boost to control relevancy. Solr also provides wide variation of queries like Phrase Query, Wildcard, Prefix for matching

  2. Exact words or phrases
    You can customize Solr to handle Phrase matches and exact word matches.

  3. Not these words or phrases
    Negative queries - Solr provides support for boolean operators which include negative queries using either - or Not

  4. One or more of words("A" OR "B" OR "C")
    Boolean Operators - Solr provides support for boolean operators which include AND (+) OR syntax

  5. Proximity of word with n words of another word
    Promixity Search - Solr supports proximity queries by the ~ operator followed by the slop (proximity difference)

  6. Numeric range - From - To Range Queries - Solr supports Range queries for both Numbers and Date.

  7. Option

    • Stemming(search* finds searching or searches) Stemmer - Solr has inbuilt stemmers which can be included directly out of the box. It also allows the ability to define new stemmer
      Detail Language Analysis support for various languages

    • Synonym(search& finds seek or look)
      Synonym - Solr supports synonym handling through a file based approach.

    • Fuzzy within n letters(p%arts finds paris)
      Fuzzy search - Solr supports fuzzy based searches with the ~ operator

    • Phonic homonyms(#Smith also finds Smithe and Smythe)
      Phonetic search - Solr provides phonetic searches allowing the match for misspell words. It has out of box support for 4 filters which can be customized.

Complete list of AnalyzersTokenizersTokenFilters