0
votes

I am using Lucene indexing for the first time. I have some documents in Hindi and English and I create index on the content of document.When I search the index I get result from all the documents even if my query is some english word it returns hindi document also. I have added the code below.please tell me where I am dong wrong.

        IndexSearcher searcher = new IndexSearcher(directory);
        QueryParser parser = new QueryParser("Content", analyzer);



        while (condition)
        {
            Search(text, searcher, parser);

        }


        searcher.Close();
        private static void Search(string text, IndexSearcher searcher, QueryParse parser)
    {
        Query query = parser.Parse(text);

        Hits hits = searcher.Search(query);
        int results = hits.Length();

        for (int i = 0; i < results; i++)
        {
            Lucene.Net.Documents.Document doc = hits.Doc(i);

            string show = doc.ToString();

            float score = hits.Score(i);

            /* insert doc id in database table*/

            }

Thanks all

1

1 Answers

0
votes

First, I would use Luke to check whether my query syntax was right. Then I would check whether that the misbehaving English word is a homogram for a Hindi word (i.e. an English word that is spelled the same as a Hindi word).

If you want to prevent a search for English search terms from coming up with Hindi documents, you will need to mark each document as to whether it is in English or Hindi, then specify that marking in your search query. In Query Parser Syntax, this could look like:

ENGLISHSEARCHTERMS +(language:English)

(where all Hindi documents have their language field set to 'Hindi' and all English documents have their language field set to 'English').