0
votes

i indexed my database to lucene for full text search. everything works fine when searching for keywords which has no symbols but whenever i search for keywords having slashes, decimals, etc. (i.e. 1/4, 1.234, 1-1/4") lucene returns no search results. what is the best way to do in indexing symbols?

3
What analyzer are you using? Are those symbols in text fields or separate fields?Thomas
@Thomas is correct, you are likely using StandardAnalyzer which strips out most punctuation and symbols. You could pass a custom stopwords list or write a custom analyzer to suit your needs.Mikos
i use standardanalyzer. the symbols are on the same field. if standardanalyzer strips out symbols, what will be the best analyzer to use?maccramers
i have an idea but i am not sure if it will work. i am planning to modify the stop words of standardanalyzer by disable all stop words except for spaces. i tried whitespaceanalyzer for my code but it didnt work. how will i implement it?maccramers

3 Answers

3
votes

Lucene has a couple of characters that should be escaped:

The characters that need to be escaped are: + - ! ( ) { } [ ] ^ " ~ * ? : \

1
votes

I'd suggest taking a look at Regular Expression. It should allow you to see if a string contains that character, where it is, and will allow you to replace it.

JavaDocs on Regular Expressions Here

1
votes

Fortunately, newer versions of Lucene already have a convenience method for escaping the said characters in the form of a static method called escape(String s) in QueryParser.

From the docs:

public static String escape(String s)

    Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding \.