I have documents which I am indexing with Lucene. These documents basically have a title (text) and body (text). Currently I am creating an index out of Lucene Document
s with (amongst other fields) a single searchable field, which is basically title+" "+body
. In this way, if you search for anything which occurs in the title or in the body, you will find the document.
However, now I have learned of the new requirement that matches in the title should cause the document to be "more relevant" than matches in the body. Thus, if there is a document with the title "Software design", and the user searches for "Software design", then that document should be placed higher up in the search results than a document called something else, which mentions software design a lot in the body.
I don't really have any idea how to begin implementing this requirement. I know that Google e.g. treats certain parts of the document as "more relevant" (e.g. text within <h1>
tags), everyone here assumes Lucene supports something similar.
However,
- The Javadoc for the Document class clearly states that fields contain text, i.e. not structured text where some parts are "more important" than other parts.
- This blog post states "With Lucene, it is impossible to increase or decrease the weight of individual terms in a document."
I'm not really sure where to look. What would you suggest?
Any specific information (e.g. links to Lucene documentation) stating flatly that such a thing is not possible would also be helpful, then I needn't spend any further time looking for how to do it. (The software is already written with Lucene, so we won't re-write it now, so if Lucene doesn't support it, then there's nothing anyone (my boss) can do about that.)