I have Solr 4.10.0 and I have performed indexing for some books. The schema documents are every book's pages, so every document has fields such as, PageID, BookID, PageNum, Content, etc. The fields definition in the schema.xml is like the following:
<field name="PageID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="Content" type="text_ar" indexed="true" stored="true" required="true" termVectors="true" />
<field name="PageNum" type="int" indexed="false" stored="true" required="false" multiValued="false" />
<field name="Part" type="int" indexed="false" stored="true" required="false" multiValued="false" />
<field name="BookID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="BookTitle" type="text_ar" indexed="true" stored="true" required="true" />
<field name="BookInfo" type="text_ar" indexed="true" stored="true" required="true" />
<field name="BookCat" type="int" indexed="false" stored="true" required="false" multiValued="false" />
The problem
When I try searching the field Content
which contains pages text, I will have multiple results from the same Book
. It is clear that is expected because a certain word may be found in many pages of a book. I tried to make SQL DISTINCT like queries like the following:
Using
facet
http: //localhost:8080/solr/books/select/?q=Content:WordOfSearch&sort=PageID%20desc&version=2.2&start=0&rows=10&indent=on&wt=json&facet=on&facet.field=BookID&facet.limit=1&hl=true&hl.q=Content:WordOfSearch
In the previous query I set facet.field=BookID
to make results have only one result with the same book. However, this solution does not work as expected and it returned results as like facet
is not used. i.e there is no change with using facet or not.
Using
group
I used it with and without the parametermain
like the following:http: //localhost:8080/solr/books/select/?q=Content:WordOfSearch&sort=PageID%20desc&version=2.2&start=0&rows=10&indent=on&wt=json&group=true&group.field=BookID&group.main=true&hl=true&hl.fl=*&hl.simple.pre=&hl.simple.post=<%2Fspan>
The group
partially solved the problem. i.e from each book contents -pages- that contains the WordOfSearch it returns one result. However, it corrupts the pagination that I did in my application. In the application I depend on response: numFound
to maintain the total records. In group
solution that I have used, it returns numFound
equals to the number found of a query without group. i.e it returns the number of documents with repeated BookID
values, so it leads to in empty pages at the last of paging. So, How could I get the exact number returned documents with group
? or any other solution for my problem with repeated BookID
field values.