2
votes

I used to upload and index Word documents using the following url..

java -Durl=http://localhost:8983/solr/update/extract?literal.id=1 -Dtype=application/word -jar post.jar microfost_det.doc

When I query the Solr Index it returns XML as ..

  http://localhost:8983/solr/collection1/select?q=microfost&wt=xml&indent=true

The Response was :

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">microfost</str>
<str name="_">1389196238897</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="id">1</str>
<date name="last_modified">1601-01-01T00:00:00Z</date>
<str name="author">fazlan </str>
<str name="author_s">fazlan </str>
<arr name="content_type">
<str>application/msword</str>
</arr>
<arr name="content">
<str> 


This is a MSWord document. Microfost.

</str>
</arr>
<long name="_version_">1456677821213573120</long></doc>
</result>
</response>

Now my problem is , I need the name of the document that contains the queried text "microfost" that is , microfost_det.doc ..

Is it possible to get the name of the Word file (that is filename.doc) that contains the queried text ..

.

1

1 Answers

0
votes

In Solr, the default searchable field is "content". That's why you are getting the result as it's matching with content. First create a custom string field (e.g docname) modifying your schema.xml

Then restart your Solr instance. Execute the following command to update your Solr doc.

curl 'http://localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","docname":{"set":"microfost_det.doc"}}]'

After that execute the following query and you'll get the result.

http://localhost:8983/solr/collection1/select?q=docname:microfost*&wt=xml&indent=true

Otherwise, while extracting the document execute the following command

java -Durl="http://localhost:8983/solr/update/extract?literal.id=1&literal.docname=microfost_det.doc" -Dtype=application/word -jar post.jar microfost_det.doc

Any way, you have to store the document name in a separate field.