14
votes

This is admittedly similar to (but not a duplicate of) Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?, however what I am looking for are specific, supported, recommendations from the benefit of experience with more than one of the available systems (there seems to be a lot of: "I've used lucene, but not sphinx", and vice a versa).

The setup: Standard LAMP (Mysql 5.0, PHP 5).

MySQL: tables are using the InnoDB engine for foreign key constraints

We are looking at indexing data, not pages. data to be indexed may be in multiple languages (utf-8 charset)

A number of the comparisons I've come across (like http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) are either not entirely applicable (ferret is a lucene port but not the same as Zend_Search_Lucene) or they are pushing their own systems/implementations (not exactly unbiased).

Some others I've come across (such as http://whatstheplot.com/blog/tag/lucene/ and http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/) provide very different results for performance of the two systems.

Also, all but ignored in much of what I've read is Xapian. Might this be worth consideration as well?

So... I'm hoping that some of you here on SO have some experience with this question and could help with some recommendations or point me in the right direction.

2

2 Answers

9
votes

One advantage of Sphinx is that you can "interpose" it between your clients and the MySQL server, and it will only "interfere" on queries specifically addressing it, transparently bouncing the others off MySQL -- see e.g this article. Whether that's an advantage in your use case, you're best placed to say!

Sorry, no real-life experience with Xapian or Lucene -- still, reading about how to deploy them, makes it sound like (to me!) as if it might be worth it only if you identified substantial advantages. Otherwise, Sphinx's "easy as pie" deployment, as a "proxy" between your clients and your MySQL server, feels like a big, substantial win to me!

3
votes

I looked at Zend_Search_Lucene and Sphinx for a project that sounds similar - searching database content (in my case, book information). I spent about a day looking at each. For what it's worth, I found Sphinx vastly easier to set up and use.