3
votes

I am using Apache Solr 8.6 to index the documents using POST Tool in Linux as mentioned in the Apache Solr Reference Guide.

POST Tool Command

bin/post -c testcore /testdocs/

The documents are getting indexed successfully. Now when I searched the string eg: hello in Solr Admin UI, It is returning the matching documents and also i can view the document content in _text_ field as i have used below property to store the content in manage-schema.xml

<field name="_text_" type="text_general" multiValued="true" indexed="true" stored="true"/> 

It is indexing the document content and storing in _text_ field but it is also storing and displaying the document property like content-type and other document type properties in _text_ field.

Now i want that these properties should not get stored in _text_ field and only actual document content should be stored.

solrconfig.xml configuration

<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler">

 <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>

</requestHandler>

Will be grateful if you could guide me further.

1

1 Answers

0
votes

You need to use the parameter uprefix instead of fmap.<source_field> in order to map unknown field names to a schema field name which will be ignored (you can use fmap only to map known field names), so in the request handler you should have :

<str name="uprefix">ignored_</str>

The corresponding dynamic field must be defined in the schema to handle these unknown fields (and the fieldType as well, don't know if it's already there when using a managed-schema) :

<dynamicField name="ignored_*" type="ignored" />