I'm having difficulties indexing TREC in Lucene 7. Until now I only needed to Index Text Files which was easily archivable by using a InputStreamReader like desribed by the Demo.
/** Indexes a single document */
static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException {
try (InputStream stream = Files.newInputStream(file)) {
// make a new, empty document
Document doc = new Document();
Field pathField = new StringField("path", file.toString(), ld.Store.YES);
doc.add(pathField);
doc.add(new LongPoint("modified", lastModified));
doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));
if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
System.out.println("adding " + file);
writer.addDocument(doc);
} else {
System.out.println("updating " + file);
writer.updateDocument(new Term("path", file.toString()), doc);
}
}
}
But TREC has different tags that store information not relevant for the search results. Like Header Title DocNo and many more. How would I adjust this Code to save specific Tags in their own textfield with their appropiate content?