0
votes

I have a problem that in my lucene index files one document can have huge text. now when i search one of these huge text documents lucene/solr does not filter any results even the search term exist in the document text. the reason that i think might be the large number of characters in document text? if yes than how could we tell solr/lucene how much characters to analyze during search, please explain

I am using Solr 1.4.1 can any

Thanks Ahsan

2

2 Answers

2
votes

Lucene can handle huge documents without trouble. It seems unlikely that the document size itself is the problem. Use a tool like Luke to inspect the index and see what terms are associated with some of these large documents.

1
votes

Also, have you changed the maxFieldLength setting in solrconfig.xml? I am testing out indexing the Bible, at 25 MB of data, and with a maxFieldLength of 10,000, which is the default, only the first 10,000 tokens ever get analysized, which leads to roughly ~2000 unique terms for my document.

If you are using Lucene directly, then there are a couple setting for maxFieldLength, you may have "unlimited" and therefore getting everything. Check the JavaDocs for how to set maxFieldLength.