4
votes

I want to score the similar documents in Lucene. Let me explain you my scenario.

For example lets say I have the following records in my file on which I created index.

ID|First Name|Last Name|DOB
1 |John      |Doe      |03/18/1990
1 |John      |Twain    |03/18/1990
3 |Joey      |Johnson  |05/14/1978
3 |Joey      |Johnson  |05/14/1987
4 |Joey      |Johnson  |05/14/1987 

When I search for "John Doe"

The Search Index I created will display records in the following order

ID|First Name|Last Name|DOB
1 |John      |Doe      |03/18/1990
3 |Joey      |Johnson  |05/14/1978
3 |Joey      |Johnson  |05/14/1987
4 |Joey      |Johnson  |05/14/1987
1 |John      |Twain    |03/18/1990 
2 |Daniel    |Doe      |03/25/1989

As you can see the Lucene is displaying records according to the terms I searched but not according to the similarity in between records. I want it to search the records with terms provided but display them based on their similarity.

What I want

ID|First Name|Last Name|DOB
1 |John      |Doe      |03/18/1990
1 |John      |Twain    |03/18/1990 
3 |Joey      |Johnson  |05/14/1978
3 |Joey      |Johnson  |05/14/1987
4 |Joey      |Johnson  |05/14/1987
2 |Daniel    |Doe      |03/25/1989

Here the record John Twain and John Doe are displayed together because they both are similar and one of them had max matches on the user query.

Are you getting me?

Search Code.

String sa=textbox1.Text; // Assume this value to be John Doe in this case.
String[] searchfield= new string[] { "ID", "First Name", "Last Name","DOB"};
IndexReader reader = IndexReader.Open(dir, true);
TopScoreDocCollector coll = TopScoreDocCollector.Create(50, true);
indexSearcher.Search(QueryMaker(sa, searchfield), coll);
        ScoreDoc[] hits = coll.TopDocs().ScoreDocs;
for (int i = 0; i < hits.Length; i++)
        {
            SearchResults result = new SearchResults();
            int docID = hits[i].Doc;
            Document d = indexSearcher.Doc(docID);
            result.fname=d.Get("First Name").ToString();
         }

Attempted Method:

I was trying to use MoreLikeThis class but not sure if I am doing it right or even if its the right method. Moreover, How will I use the Like method for two or more docid's? ALso, If use the docid's it will display the duplicate document because I am reading from the same reader

Code:

IndexSearcher mltsearcher = new IndexSearcher(reader);
MoreLikeThis mlt = new MoreLikeThis(reader);
int docid =hits[1].Doc;
Query query = mlt.Like(docid);
TopDocs similardocs = mltsearcher.Search(query, 10);

Please let me know if you have any questions.

I am trying to learn Lucene from past two weeks so don't know much it.

Note: I am using Lucene.Net 3.0.3

1
What are the values of sa and searchfield?femtoRgon
sa is the query entered by user. String sa=textbox1.Text and String String[] SearchField= new string[] { "ID", "First Name", "Last Name","DOB"};Huzaifa

1 Answers

2
votes

Can you show the code of the method QueryMaker()?

I think you can make a new field "name" which is composed by both firstname and lastname, and you can use FuzzyQuery to search in the new field. FuzzyQuery is score docs according to the levenshtein distance of strings.