1
votes

I have the data availability like this, few data are stored in the database and other are uploaded as the pdf/word/excel documents in the file server. how should the Lucene index be if I wish to index the above all? should the index be different for table and the documents? such that the search string is searched along the indexes or combine into a single index with various fields structure(does lucene support this?)?

thanks V

1
do you want to make a different between the documents in the database and the documents on the fileserver?Tyzak
Yes, they are different, the database contains few fields, but the docs texts (pdf/word) will be index as such.Vijay Veeraraghavan
well then if i want you can use one index, and seperate those two typs of documents by a field (like i descriped in the answer). You can check from where you index the doc. and then you can fill the fields, depending on that. Later in the application you can check the field, and use only the documents from the index you want to use :)Tyzak

1 Answers

0
votes

if you don't want to make a difference between the documents, you can use one index. you can go trough the stucture of a folder by using filesysteminfo. with filesysteminfo you can check if it is an folder or an document, if it is an document, you index it, if not you call the function again.

Dim filesysteminfo As FileSystemInfo

        Dim FSIs As FileSystemInfo() = New DirectoryInfo(yourfolderroot).GetFileSystemInfos 

        For Each filesysteminfo In FSIs

            If TypeOf filesysteminfo Is DirectoryInfo Then

                function_create_document(filesysteminfo.FullName, indexwriter, id)

            Else


                Dim dynamic_doc As New Document()

               Dim sr As System.IO.StreamReader = New StreamReader(filesysteminfo.FullName)

                Dim filename As String = filesysteminfo.Name 


                          ...

if you want to make a difference, you can check if you get the document from the database or from your fileserver. Just store your information in a field.

use a stringvariable (yourstring) if your document is from the database yout string is "database" else it is "fileserver"

Dim field_typ As Field = New Field("doc_typ", yourstring, Field.Store.YES, Field.Index.TOKENIZED)