I would like to index 1 million of html documents in Lucene. I need to index in one Lucene document several html files. Lately, I would like to know in the search response the original html document.
So, for example I have:
1.home.html
2.history.html
3.about.html
4.home2.html
...
I want to index 1, 2 and 3 in the same Lucene document. Then, if I search any text I want to know the original document (home, history or about).
I have been searching in Internet and I found Lucene payload. So I have been thinking about indexing the url of the original document in all the terms. Is this a good solution? the performance would be allright?
Thanks very much for your help.