3
votes

I'm creating a Lucene 4.10.3 index.

I am using he StandardAnalyzer.

    String indexpath="C:\\TEMP";
    IndexWriterConfig iwc=newIndexWriterConfig(Version.LUCENE_4_10_3,new StandardAnalyzer(CharArraySet.EMPTY_SET)); 
    Directory dir = FSDirectory.open(new File(indexpath));          
    IndexWriter indexWriter = new IndexWriter(dir, iwc);
    iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);   
    Document doc = new Document();
    doc.add(new TextField("city", "ANDHRA",Store.YES));
    doc.add(new TextField("city", "ANDHRA PRADESH",Store.YES));
    doc.add(new TextField("city", "ASSAM AND NAGALAND",Store.YES));
    doc.add(new TextField("city", "ASSAM",Store.YES));
    doc.add(new TextField("city", "PUNJAB",Store.YES));
    doc.add(new TextField("city", "PUNJAB AND HARYANA",Store.YES));
    indexWriter.addDocument(doc);

when I try to search in lucene index using phrase query

for example

 try {
        QueryBuilder build=new QueryBuilder(new KeywordAnalyzer());
        Query q1=build.createPhraseQuery("city","ANDHRA");      
        Directory dir = FSDirectory.open(new File("C:\\TEMP"));
        DirectoryReader indexReader = DirectoryReader.open(dir);    
        IndexSearcher searcher = new IndexSearcher(indexReader);
        ScoreDoc hits[] = searcher.search(q1,10).scoreDocs;
        Set<String> set=new HashSet<String>();
        set.add("city");
        for (int i=0; i < hits.length; i++) {
            Document document = indexReader.document(hits[i].doc,set);
            System.out.println(document.get("city"));
        }
     } catch (IOException e) {
        e.printStackTrace();
     }

we get result as follow-

ANDHRA

ANDHRA PRADESH

When I am searching for "ANDHRA" how to get only "ANDHRA" result, not "ANDHRA PRADESH", how to match entire field value in lucene by using StandardAnalyzer?

1

1 Answers

5
votes

If you want to match the exact, unmodified and untokenized, value of the field, you shouldn't be analyzing it at all. Simply use a StringField instead of TextField.

If you want some analysis (ie. lowercasing, or some such), but without tokenizing, you can use KeywordTokenizer in your Analyzer implementation for that.

If you are using a QueryParser to create your queries, be aware of how the the parser uses spaces to separate query clauses. You may find it necessary to write queries like: city:ANDHRA\ PRADESH (I do not believe QueryParser.escape will do this for you).