I want to score the similar documents in Lucene. Let me explain you my scenario.
For example lets say I have the following records in my file on which I created index.
ID|First Name|Last Name|DOB 1 |John |Doe |03/18/1990 1 |John |Twain |03/18/1990 3 |Joey |Johnson |05/14/1978 3 |Joey |Johnson |05/14/1987 4 |Joey |Johnson |05/14/1987
When I search for "John Doe"
The Search Index I created will display records in the following order
ID|First Name|Last Name|DOB 1 |John |Doe |03/18/1990 3 |Joey |Johnson |05/14/1978 3 |Joey |Johnson |05/14/1987 4 |Joey |Johnson |05/14/1987 1 |John |Twain |03/18/1990 2 |Daniel |Doe |03/25/1989
As you can see the Lucene is displaying records according to the terms I searched but not according to the similarity in between records. I want it to search the records with terms provided but display them based on their similarity.
What I want
ID|First Name|Last Name|DOB 1 |John |Doe |03/18/1990 1 |John |Twain |03/18/1990 3 |Joey |Johnson |05/14/1978 3 |Joey |Johnson |05/14/1987 4 |Joey |Johnson |05/14/1987 2 |Daniel |Doe |03/25/1989
Here the record John Twain and John Doe are displayed together because they both are similar and one of them had max matches on the user query.
Are you getting me?
Search Code.
String sa=textbox1.Text; // Assume this value to be John Doe in this case.
String[] searchfield= new string[] { "ID", "First Name", "Last Name","DOB"};
IndexReader reader = IndexReader.Open(dir, true);
TopScoreDocCollector coll = TopScoreDocCollector.Create(50, true);
indexSearcher.Search(QueryMaker(sa, searchfield), coll);
ScoreDoc[] hits = coll.TopDocs().ScoreDocs;
for (int i = 0; i < hits.Length; i++)
{
SearchResults result = new SearchResults();
int docID = hits[i].Doc;
Document d = indexSearcher.Doc(docID);
result.fname=d.Get("First Name").ToString();
}
Attempted Method:
I was trying to use MoreLikeThis class but not sure if I am doing it right or even if its the right method. Moreover, How will I use the Like method for two or more docid's? ALso, If use the docid's it will display the duplicate document because I am reading from the same reader
Code:
IndexSearcher mltsearcher = new IndexSearcher(reader);
MoreLikeThis mlt = new MoreLikeThis(reader);
int docid =hits[1].Doc;
Query query = mlt.Like(docid);
TopDocs similardocs = mltsearcher.Search(query, 10);
Please let me know if you have any questions.
I am trying to learn Lucene from past two weeks so don't know much it.
Note: I am using Lucene.Net 3.0.3
sa
andsearchfield
? – femtoRgon