I'm making an application in Java using Lucene 3.6 and want to make an incremental rate. I have already created the index, and I read that you have to do is open the existing index, and check each document indexing and document modification dates to see if they differ delete the index file and re-add again. My problem is I do not know how to do that in Java Lucene.
Thanks
My code is:
public static void main(String[] args)
throws CorruptIndexException, LockObtainFailedException,
IOException {
File docDir = new File("D:\\PRUEBASLUCENE");
File indexDir = new File("C:\\PRUEBA");
Directory fsDir = FSDirectory.open(indexDir);
Analyzer an = new StandardAnalyzer(Version.LUCENE_36);
IndexWriter indexWriter
= new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
long numChars = 0L;
for (File f : docDir.listFiles()) {
String fileName = f.getName();
Document d = new Document();
d.add(new Field("Name",fileName,
Store.YES,Index.NOT_ANALYZED));
d.add(new Field("Path",f.getPath(),Store.YES,Index.ANALYZED));
long tamano = f.length();
d.add(new Field("Size",""+tamano,Store.YES,Index.ANALYZED));
long fechalong = f.lastModified();
d.add(new Field("Modification_Date",""+fechalong,Store.YES,Index.ANALYZED));
indexWriter.addDocument(d);
}
indexWriter.optimize();
indexWriter.close();
int numDocs = indexWriter.numDocs();
System.out.println("Index Directory=" + indexDir.getCanonicalPath());
System.out.println("Doc Directory=" + docDir.getCanonicalPath());
System.out.println("num docs=" + numDocs);
System.out.println("num chars=" + numChars);
}
Thanks Edmondo1984, you are helping me a lot.
Finally I did the code as shown below. Storing the hash of the file, and then checking the modification date.
In 9300 index files takes 15 seconds, and re-index (without any index has not changed because no file) takes 15 seconds. Am I doing something wrong or I can optimize the code to take less?
Thanks jtahlborn, doing what I managed to equalize indexReader times to create and update. Are not you supposed to update an existing index should be faster to recreate it? Is it possible to further optimize the code?
if(IndexReader.indexExists(dir))
{
//reader is a IndexReader and is passed as parameter to the function
//searcher is a IndexSearcher and is passed as parameter to the function
term = new Term("Hash",String.valueOf(file.hashCode()));
Query termQuery = new TermQuery(term);
TopDocs topDocs = searcher.search(termQuery,1);
if(topDocs.totalHits==1)
{
Document doc;
int docId,comparedate;
docId=topDocs.scoreDocs[0].doc;
doc=reader.document(docId);
String dateIndString=doc.get("Modification_date");
long dateIndLong=Long.parseLong(dateIndString);
Date date_ind=new Date(dateIndLong);
String dateFichString=DateTools.timeToString(file.lastModified(), DateTools.Resolution.MINUTE);
long dateFichLong=Long.parseLong(dateFichString);
Date date_fich=new Date(dateFichLong);
//Compare the two dates
comparedates=date_fich.compareTo(date_ind);
if(comparedate>=0)
{
if(comparedate==0)
{
//If comparation is 0 do nothing
flag=2;
}
else
{
//if comparation>0 updateDocument
flag=1;
}
}