1
votes

I have some doc files in d:/tmp/docs location on my local machine and I want to index them using Apache Solr and Tika. Following is my data-config.xml file.

<dataSource type="BinFileDataSource" />
    <document>
        <entity name="file_Import" dataSource="null" rootEntity="false"
        processor="FileListEntityProcessor"
        baseDir="D:/temp/docs" fileName=".*\.(doc)|(pdf)|(docx)"
        onError="skip"
        recursive="true">
            <field column="fileAbsolutePath" name="id" />
            <field column="fileSize" name="size" />
            <field column="fileLastModified" name="lastModified" />

            <entity
                name="documentImport"
                processor="TikaEntityProcessor"
                url="${files.fileAbsolutePath}"
                format="text">
                <field column="file" name="fileName"/>
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/>

            </entity>
    </entity>
    </document> 

When I try to import those files into solr I get following exception:

Caused by: java.net.MalformedURLException: no protocol: null
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:90)
... 11 more

I figured out that sorl is not able to locate d:/temp/docs folder.

Don't know how to resolve. Any help appreciated.

2

2 Answers

0
votes

Resolved ...

I had more than one dataSource tags in my data-config.xml out of which one was <dataSource type="URLDataSource" /> causing a problem.. So I removed all the dataSources and kept only <dataSource type="BinFileDataSource" />

and it worked ... :)

0
votes

check the url for datasource baseDir

try changing from

baseDir="D:/temp/docs"

to

baseDir="D:/temp/docs/"

and change filename like *.* to index all docs in that folder