3
votes

I have a document with a field called uuids. This field is a list (multivalued) can have up to 100k values per document.

I want to search for documents that match uuids that start with "5ff6115e" for instance. I can already do it successfully by using q=uuids:5ff6115e*:

http://localhost:8983/solr/test1/select?q=uuids%3A5ff6115e*&rows=1&fl=uuids&wt=json&indent=true

However, the resultant document brings me all 100k values for this field.

What I want is not only filter the documents whose uuids field start with this value, but also filter the field values returned so that I will only receive specific values in the answer.

How to do that?

3
It's a different question, as I don't want to filter just which fields will come in the result, but the values - mvallebr

3 Answers

2
votes

Use highlighting. @Jokin first mentioned it and I feel this is the best answer without hacking on Solr. Try either the PostingsHighlighter or the FastVectorHighlighter, not the default/standard highlighter. Unfortunately both of them internally execute a wildcard query against all UIDS in this field. FVH has the opportunity internally to be smarter about that but it's not implemented that way.

note: if it's within scope to write a little Java to add to Solr, the ideal answer would be to add term vectors (just the terms data in the term-vector, no offsets/positions) and then write a "DocTransformer" to grab the term vector terms; seek to the prefix, then iterate on those that have that prefix. Pretty darned fast.

1
votes

This is not currently possible; see this bug and this previous question.

1
votes

I don't know how big it's your index, but having a document with 100k multivalued fields doesn't seem the right approach to me. In this cases instead of asking for a feature in solr, it's better to refactor your index and store the information in other way, maybe creating another core with documents that have each the uniqueid of your document and a field with the guid. You can use then field collapsing or other solr features to get the info that you need.

So, for example, a simple case in solr was to index books, and instead of indexing each book as a whole, it was better to index each separate page as a document. If you could tell us a bit more about your case we can think how the index can be improved.

Anyway, for cases that doesn't have so many values you can achive the same result with the highlighting component. for best performance you can exclude the field in the return field list, and use the highlighter to return the matched terms. You can tune the highlighter to get the maximum number of snippets and how big is each one etc. http://localhost:8893/solr/test1/select?q=uuids%3A5ff6115e*&rows=1&fl=id&wt=json&indent=true&hl=on&hl.fragsize=1&hl.fl=uuids