2
votes

I have been having some difficulty with Lucene and would appreciate any help.

I have a custom query which is manually written and parsed (this query) using QueryParser.Parse. I am using version LUCENE_29 and the StandardAnalyzer.

In my query I have a special character (colon) and need this to remain:

+(Name:"test\:word" OR Business:"test\:word hello")

The output after parsing the query text above is:

+(Name:"test word" OR Business:"test word hello")

Does anyone have any suggestions, I tried passing an empty stop words collection to the StandardAnalyzer constructor but that has no effect it still strips out the colon.

Thank you.

1
You ask a good question. I had a similar problem with Lucene and found no way to resolve this issue. Lucene was retired on our website partly due to this issue.JohnH
@JohnH thanks for sharing this info!Mr Sheriff
FYI - LUCENE_29 only tells us the version compatibility you have set, it doesn't tell us what lucene or lucene.net version you are using.NightOwl888

1 Answers

1
votes

You can't. StandardAnalyzer was specifically designed to remove special characters.

The answer is to use an Analyzer implementation that doesn't strip special characters (such as WhiteSpaceAnalyzer) or to build a custom analyzer based on existing tokenizers and filters to meet your needs.

Note that you would need to use WhiteSpaceAnalyzer to index your data with those special characters, otherwise they won't be available at query-time.