0
votes

I am integrating a lucene 3.6 search API into a java desktop application. The lucene system uses a file system directory to store the index. The code to create index directory , the indexwriter and adding Documents to the index.

Data for the index is collected from a derby database. Fields of the database table are added as fields to the lucene document.So each row in the database table is represented as a single lucene document.

My question is that, is there a way to check the index directory and if it is not populated with lucene documents then populate it. Or to skip repopulating the index when it is already population.

Code for creating index file.

public File createIndexDir() throws IOException, SQLException
    {       
    //Check if directory exist 
      if(!userDir.exists())
      { userDir.mkdir();
      System.out.println(" Index directory created at  " + userDir.getAbsolutePath());     
      }  
      return userDir.getAbsoluteFile();
    }

code for creating index writer

public void createIndexWriter() throws IOException, SQLException
    {
     indexDir =  createIndexDir();  
     if(iw == null)
          {
            try {
                // create some index
              StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
             IndexWriterConfig IWConfig = new IndexWriterConfig(Version.LUCENE_36, analyzer);

             iw = new IndexWriter(FSDirectory.open(indexDir), IWConfig);

            }
            catch (CorruptIndexException ex) {
                Logger.getLogger(Indexer.class.getName()).log(Level.SEVERE, null, ex);
            } catch (LockObtainFailedException ex) {
                Logger.getLogger(Indexer.class.getName()).log(Level.SEVERE, null, ex);
            } catch (IOException ex) {
                Logger.getLogger(Indexer.class.getName()).log(Level.SEVERE, null, ex);
            }
          }    
    }

this is the code that populated the index file with data from a database

     public void buildIndex () throws SQLException, CorruptIndexException, IOException
     {   

     /* Connecting to the database */
    Connection  con = DriverManager.getConnection(host, uName, uPass);
    Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
    String sql = "SELECT * FROM APP.REGISTRY";
    ResultSet rs = stmt.executeQuery(sql); 


    rs.beforeFirst();  //set poinyrt to begining of result set
     while(rs.next())
     {
     Document doc = new Document();

     doc.add(new Field("id",rs.getString("ID"),Field.Store.YES,Field.Index.NO));

     if(rs.getString("SUBJECT")== null)
     { doc.add(new Field("subject","",Field.Store.YES,Field.Index.ANALYZED)); }
     else {
     doc.add(new Field("subject",rs.getString("SUBJECT"),Field.Store.YES,Field.Index.ANALYZED));
     }

     if(rs.getString("LETTER_FROM")== null)
     { doc.add(new Field("letter_from"," ",Field.Store.YES,Field.Index.ANALYZED)); }
     else {
     doc.add(new Field("letter_from",rs.getString("LETTER_FROM"),Field.Store.YES,Field.Index.ANALYZED));
     }

    doc.add(new Field("date_of_letter",DateTools.dateToString(rs.getDate("DATE_OF_LETTER"),
            DateTools.Resolution.DAY),Field.Store.YES,Field.Index.ANALYZED)); 

      doc.add(new Field("date_received",DateTools.dateToString(rs.getDate("DATE_RECEIVED"),
            DateTools.Resolution.DAY),Field.Store.YES,Field.Index.NO));             

     if(rs.getString("REMARKS")== null)
     { doc.add(new Field("remarks"," ",Field.Store.YES,Field.Index.ANALYZED)); }
     else {
     doc.add(new Field("remarks",rs.getString("REMARKS"),Field.Store.YES,Field.Index.ANALYZED));  }

      if(rs.getDate("DATE_DISPATCHED")== null)
     { doc.add(new Field("date_dispatched",DateTools.dateToString(new Date(0L),DateTools.Resolution.DAY),Field.Store.YES,Field.Index.ANALYZED)); }
     else {
    doc.add(new Field("date_dispatched",DateTools.dateToString(rs.getDate("DATE_DISPATCHED"),
            DateTools.Resolution.MINUTE),Field.Store.YES,Field.Index.ANALYZED));    
            }     

     if(rs.getString("OFFICE_DISPATCHED_TO")== null)
     { doc.add(new Field("office_dispatched_to"," ",Field.Store.YES,Field.Index.ANALYZED));}
     else {
     doc.add(new Field("office_dispatched_to",rs.getString("OFFICE_DISPATCHED_TO"),Field.Store.YES,Field.Index.ANALYZED)); 
       }
     iw.addDocument(doc);
     }   
   iw.commit();     
   closeIndexWriter();
   stmt.close();
   rs.close();
   con.close();
     }

Any idea for a solution. cheers to all.

2
please define population.. how do you want to handle half-populated scenario?phanin
sorry by populating I mean creating the fields and documents and adding them to the lucene index.CodeAngel

2 Answers

0
votes

You could query the index for data that you know is on your Derby DB, either some sample entries or for the total number of records. If that's there you don't need to repopulate the index.

0
votes

You may try any/all of the following steps.

1) Check for presence of the first and last entry in the Index you plan to populate.

2) If possible, you can also compare the latest updated time of your data source and Lucene Index (file updated date).

3) You can check the number of entries that are supposed to be in the Index. IndexReader.numDocs() or maxDocs() whatever.. that's relevant to your usecase.