3
votes

I'm looking into the different options for choosing a search server for a project I'm involved in. The search server is used to power results on a dating website built in Rails, in which the search provides all the 'matchmaking'-magic.

Typical queries would involve ranking documents/results using an expression (in pseudo-code):

  • Order by ranking:
    • +50 if has_image attribute is true
    • +10 if has_boost attribute is true
    • +50 if latitude/longitude is within 40 miles from [point]
    • +20 if latitude/longitude is within 80 miles [point]
    • -(distance from attribute 'age' to 30)
  • Filter by:
    • Attribute 'age' between 25 and 35
    • Attribute 'sex' equals 'male'

Per default I'm not needing the full-text features of most of the search servers out there, and I do not need the full documents to be retrieved - just a unique ID.

The nature of the project yields for a search-server with the following properties:

  • Spartial ranking
  • Ranking of results based on a custom function
  • Attribute filters
  • Scalable and fast
  • Free

I've found Sphinx, Solr and ElasticSearch, but all of these are (as far as I see) built and optimized for full-text searching, with both ES and Solr built on Lucene, and I don't know what would perform best for filter/attribute heavy searching.

My questions:

  • Which of these servers would you prefer and why?
  • Have I missed any other obvious choices?
4

4 Answers

5
votes

Don't know about the others, but Solr can do all of this:

Spatial ranking

You'll need a nightly build of Solr (the latest stable release as of this writing, Solr 1.4.1, doesn't include this feature), as far as I know this is a pretty stable feature in trunk.

Ranking of results based on a custom function

Solr has lots of function queries to do boosting.

Attribute filters

This is a common search feature.

Scalable and fast

Lots of big websites are using Solr, evidence of its scalability and speed.

Free

Solr is Apache licensed, a very permissive license.

4
votes

ElasticSearch has all these feature, as well.

Geographic distance/bounding box/polygon and custom score scripts in various languages are supported: http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/

You will have no problem with performance of filters or other query types, we're doing heavy filtering on our queries with 100+ attributes in some cases and it is fast.

Another thing to take into account is integration with your data store. ES has a nice River feature for this, but it's not compatible with all data stores, but similar can be achieved via post commit hooks.

Also, social sites benefit from (near) realtime search and ElasticSearch has a 1 second default. It also is much more clean to configure and scale than Solr. This is my opinion after a month long eval of each app. It also does a really good job of adapting to your data model.

Hope this helps.

Paul

1
votes

You aren't talking about a search engine. You're talking about a database. in SQL, filtering is standard SELECT stuff; the ranking can be done by a somewhat crufty expression involving lots of CASE, and then ORDER BY.

To do the spatial parts of the query, you will need a database with geospatial features.

The only scalable, fast, free relational database with geospatial features is PostgreSQL.

0
votes

I think that while you could use a search engine like Solr or ES to power this, I think that the "business rules" that you've defined mean that you are going to end up doing post processing.

I think that the filtering and basic search is pretty easily done in your search engine, but I am guessing that the ordering logic is going to end up pretty custom and complex, and trying to push that into your search queries may be like putting a round peg in a square hole... You may be better off querying for results and then using your own post processor rules library to handle the ordering.