0
votes

I'm working on a website that's running Django with Solr as it search backend. Haystack functions as Django's interface to Solr. I currently have one Solr collection, Apps. Apps each have multiple releases, but in Solr they manifest as one (most recent) release per app. I've come up against a limitation for that architecture: I need to be able to search all of an app's releases and return the most relevant one.

Example data in Django ORM:

App Foo

  • Release A - released Nov 2017, compatible with Linux
  • Release B - released April 2017, compatible with Windows

Example search in Solr: Give me all the apps with a release that's compatible with Windows

Expected: App Foo is returned.

Actual: App Foo is not returned because we're only storing Release A's metadata in the App Foo document in Solr.

A solution I'm pursuing is to index Solr based on Release rather than App. But when we do that, how do we use Solr/Haystack to return only the most recent release that matches the query?

It seems like Result Grouping / Field Collapsing might solve the problem: http://yonik.com/solr-result-grouping-field-collapsing/ Grouping results based on a matching attribute in one field, and returning the top N results sounds about right. But does Haystack support it? If not, is there a way to shoehorn it in?

An alternate solution might be to use Solr nested documents: http://yonik.com/solr-nested-objects/ Releases are indeed children of Apps. But again, I'm finding that Haystack doesn't support this feature. Also, the syntax for nested objects is ... crazy.

What's the best practice for solving this problem? Result grouping or Nested objects? What's the difference between the two? Why would you use one and not the other?

Lastly, am I going to have to rip out Haystack and use a different interface to Solr?

Thanks in advance!

1

1 Answers

0
votes

If you can add raw parameters to your solr query I think the best option its the result collapsing in solr, once you have all the releases indexed, you can collapse by the app field, so it will only return one result. You can then inform the collapse parser that you want the newest one.

fq={!collapse field=app max=timestamp_field}

the response format it's the same, so you shouldn't have to modify anything in the response parsing.