After a webpage has been crawled with Apache Nutch 2.2.1, contents of that page are pushed to Solr. Solr stores the contents of entire webpages in the "content" field, so data in that field is usually very sizable. So here's my concerns:
Should I index the "content" field in Solr? Indexing such a large field will increase index size. In Solr's schema.xml file I found the following recommendation:
NOTE: This field is not indexed by default, since it is also copied to "text"
using copyField below. This is to save space. Use this field for returning and
highlighting document content. Use the "text" field to search the content.
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
However, if I left this field unindexed, would it increase search response time significantly?
I'd greatly appreciate any information that will help me to understand benefits of not indexing this large field or benefits of indexing it.