I'm designing a Lucene search index that includes ranked tags for each document.
Example:
Document 1
tag: java , rank 1.2
tag: learning, rank 2.1
tag: bugs, rank 1.2
tag: architecture: rank 0.3
The tags comes from an automated classification algorithm that is also assigning a score.
How do I design the index so I can query for search for a combination of tags and return the most relevant results? Example, search for java+learning
I've initially created a FIELD for each tag and used the rank to boost the field for each document. Is this a good approach in terms of performance? What if I have 10,000 possible tags? Is it a good idea to have 10,000 FIELDS in Lucene?
Field tag = new Field(
FIELD_TAG+tag.getId(),
"y",
Field.Store.NO,
Field.Index.NOT_ANALYZED);
tag.setBoost(tag.getRank());
luceneDoc.add(tag);
If I instead add all the tags to the same field, how can I take into account the rank?