1
votes

I'm trying to do a Lucene search by a specific string term.
Eg: I had the tags 1-"Hello World", 2-"Hello, Steve", 3-"Helloween" and finally 4-"Hello" if I look for the last tag (hello), Lucene will bring all tags, because all of them had "hello" at some point. I need an operator or a logic that makes the search without "like".

There is a way to avoid this using the clause "must_not" (- operator) and the query will be: term:hello -term:world. But this is not the case, cause I will need to find all other words that should not be in search.

private <T> Query createQuery(final Class<T> clazz, String s, final String[] fields, final SearchFactory searchFactory, final Boolean allowLeadingWildcard) throws ParseException {
    final Analyzer analyzer = searchFactory.getAnalyzer(clazz);
    final QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, analyzer);
    Query query = null;
    try{
        query = parser.parse(s);
    } catch(...){...}
    return query;

My knowledge of Lucene is short, so I will place an SQL example to see if will be easier to understand

/*This is what Lucene is doing. It will bring "HELLO", "HELLO WORLD", "Hello, Steve"...*/
WHERE table.tag LIKE "%HELLO%" 
/*This is what I want. Match exactly the term "HELLO" and nothing more*/
WHERE table.tag = "HELLO" 

I guess that this is the Analyzer used in the application:

public class AnalyserCustom extends Analyzer {

    @Override
    public TokenStream tokenStream(final String fieldName, final Reader reader) {
        final StandardTokenizer tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);

        TokenStream stream = new StandardFilter(Version.LUCENE_36, tokenizer);
        stream = new LowerCaseFilter(Version.LUCENE_36, stream);
        return new ASCIIFoldingFilter(stream);
    }
}

And the attribute TAG is this:

...
@Field
private String tagname;
...

Any suggestions?
PS: I'm new to Lucene.

1
I missed the point of your final sentence "I will need to find all other words that should not be in search." Can you clarify what that means? You can edit the question - some more examples may help.andrewjames
For example... if I had the tags 1-"Hello World", 2-"Hello, Steve", 3-"Helloween" and finally 4-"Hello" if I look for the last tag (hello), Lucene will bring all tags, because all of them had "hello" at some point. I need an operator or a logic that make the search without "like".Jean Pierre
Can you edit your question and place this new example in there? Can you also show exactly what results you want to get, as well as what you currently get? Sorry to push for this, but I don't understand what you mean by an operator or a logic that make the search without "like" - and how that translates into what the end result should look like.andrewjames
Also, what analyzer are you using, and what types of fields are created when you index the data? It's probably easiest to show the relevant code. All of these things can have a significant effect on how queries behave.andrewjames
I've added some code. Let me know if it helps.Jean Pierre

1 Answers

0
votes

You have to use to index the field, that will generate one specific token for the searched string, try with KeywordAnalyzer.