5
votes

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr

I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.

I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.

Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt

enter image description here

1
If those are the only fields in your index that describes the documents, you can't. How did you generate the index files? Those id values seems to be actual text from the documents, and not suitable unique ids.MatsLindh
I am using this project github.com/LucidWorks/hadoop-solr @MatsLindhSpyros_av
You should start reading Solr basics before asking. As @MatsLindh said, the first thing is that you should provide suitable unique ids for the id field. The actual text from the documents should be indexed in an apropriated text field, see Solr Field Types. Also if you want the name of the matched documents, why not indexing & storing the name of the documents ?EricLavault
@Spyros_av please provide a sample of the data you send to Solr, with the update request. Are you runnning Solr in schemaless mode ?EricLavault
@n0tting i forgot to mention that i am using SolrCloud. The data that i am using is same books in .txt format from gutenberg.orgSpyros_av

1 Answers

3
votes

Same thing I said when you mentioned this question on IRC:

Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.

https://wiki.apache.org/solr/HowToReindex