I have a folder (MY_FILES) that has around 500 files and each day a new file arrives and it's placed there. Size of each file is around 4Mb.
I've just developed a simple 'void main' to test if I can search for a specific wildcard in those files. It works just fine.
Problem is that I'm deleting the old indexed_folder and reindex again. This takes a lot of time and obviously is inefficient. What I'm looking for is an 'incremental indexing'. Meaning, if the index exists already - just add the new files to the index.
I was wondering if Lucene has some kind of mechanism to check if the 'doc' was indexed before trying to index it. Something like writer.isDocExists?
Thanks!
My code looks like this:
// build the writer
IndexWriter writer;
IndexWriterConfig indexWriter = new IndexWriterConfig(Version.LUCENE_36, analyzer);
writer = new IndexWriter(fsDir, indexWriter);
writer.deleteAll(); //must - otherwise it will return duplicated result
//build the docs and add to writer
File dir = new File(MY_FILES);
File[] files = dir.listFiles();
int counter = 0;
for (File file : files)
{
String path = file.getCanonicalPath();
FileReader reader = new FileReader(file);
Document doc = new Document();
doc.add(new Field("filename", file.getName(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("path", path, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("content", reader));
writer.addDocument(doc);
System.out.println("indexing "+file.getName()+" "+ ++counter+"/"+files.length);
}