1
votes

I'm new to Solr and cannot figure out why Delta import does nothing, while full import works fine. Whenever I run Delta-import, I get back the same response that did not mention about adding any new Documents. The updated_at column exists and contains the correct timestamp whenever that row is edited/added.

Am I missing out something that is required to get Delta import to work?

Output of http://domain.com:8080/solr/dataimport?command=delta-import

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">104</int>
    </lst>
    <lst name="initArgs">
        <lst name="defaults">
            <str name="config">/usr/local/solr/conf/data-config.xml</str>
        </lst>
    </lst>
    <str name="command">delta-import</str>
    <str name="status">idle</str>
    <str name="importResponse"/>
    <lst name="statusMessages">
        <str name="Total Requests made to DataSource">1</str>
        <str name="Total Rows Fetched">0</str>
        <str name="Total Documents Skipped">0</str>
        <str name="Delta Dump started">2012-08-24 01:55:07</str>
        <str name="Identifying Delta">2012-08-24 01:55:07</str>
        <str name="Deltas Obtained">2012-08-24 01:55:07</str>
        <str name="Building documents">2012-08-24 01:55:07</str>
        <str name="Total Changed Documents">0</str>
        <str name="Total Documents Processed">0</str>
        <str name="Time taken">0:0:0.9</str>
    </lst>
    <str name="WARNING">
        This response format is experimental. It is likely to change in the future.
    </str>
</response>

data-config.xml

<dataConfig>

    <dataSource 
        name="mysql"
        driver="com.mysql.jdbc.Driver" 
        url="jdbc:mysql://localhost/mysite" 
        user="myuser" 
        password="mypassword" />

    <document>
        <entity 
            name="posts" 
            datasource="mysql"
            query="select id, title, description from posts"
            deltaQuery="select id from posts where updated_at > '${dataimporter.last_index_time}'"
            deltaImportQuery="select id, title, description from posts where id='${dataimporter.delta.id}'">
        </entity>
        <field column="id" name="id" indexed="true" stored="true" />
        <field column="title" name="title" indexed="true" stored="true" />
        <field column="description" name="description" indexed="true" stored="true" />
    </document>

</dataConfig>
1
Try running the deltaQuery manually to verify that it's actually returning some documents, you can find the ${dataimporter.last_index_time} inside your dataimport.properties file present in your conf directory.Suryansh Purwar
The last_index_time in dataimport.properties seem to be 4 hours ahead of the actual time! This explains why theres no document selected by DIH! How can I adjust the time for DIH?Nyxynyx
I don't think that this is DIH's problem. Because I believe that DIH's time is same as that of your system's time. Are you sure that's not your system's problem?Suryansh Purwar

1 Answers

1
votes

Try changing the structure of the document, enclosing field elements by the entity element and add a primary key attribute yo the entity:

<entity 
    name="posts"
    pk="id"
    datasource="mysql"
    query="select id, title, description from posts"
    deltaQuery="select id from posts where updated_at > '${dataimporter.last_index_time}'"
    deltaImportQuery="select id, title, description from posts where id='${dataimporter.delta.id}'">
  <field column="id" name="id" indexed="true" stored="true" />
  <field column="title" name="title" indexed="true" stored="true" />
  <field column="description" name="description" indexed="true" stored="true" />    
</entity>