2
votes

I have a MySQL database with a table with 20 million rows. I'd like to be able to do a free text search of one of the columns which is a varchar(255). The sum of the length of all these values is 60 million characters. Currently to do a query such as:

select value from table1 where match( value ) against( 'history' ) ;

takes twenty to thirty seconds. What would it take to get this type of query to complete in one second or less?

Currently this is running on a VPS. What hardward/software should I consider using to improve this search to 1 second or less.

2
@Alfabravo The OP is already using full text search, as is stated in the question title.Michael Mior
My bad. Sorry. Lucene is the way, thenAlfabravo
Just FYI, I'm the OP and I decided I'd go with a hosted solution. I tried to use AWS's full text search service but it was going to cost around $400 per month for my usage. I then tried a much more affordable sphinx powered service from IndexDen. This worked great for over a year but then they started talking about shutting the service down. So I moved to Microsoft's Azure Search service. This has been working fine for a couple of years.John Weidner

2 Answers

3
votes

I completely agree with Stanislav here. I think any external search engine like http://lucene.apache.org/ or http://sphinxsearch.com/ will be faster on the collection size you've mentioned.

For Sphinx crash course I would recommend to start with simple setup described in http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

In your case I would add few things into basic setup mentioned.

Use ranged query in source config to lower pressure on MySQL while indexing and extend sql_query with start/end template:

source my_source
{
  ...
  sql_query_range = SELECT MIN(id), MAX(id) FROM table
  sql_range_step  = 1000
  ...
  sql_query  = SELECT id, ... FROM table WHERE id>=$start AND id <= $end
  ...
}

This will tell Sphinx to fetch up to 1000 docs per MySQL query instead of all records in table at once. If you have more than 1M records this is must have option.

In your case depends on amount of memory you have on the box I would also increase indexer's mem_limit up to 512M..1024M so indexing will work faster.

As you play with Sphinx you may want to move some queries from MySQL to Sphinx side and also add non-full text fields to Sphinx index to perform geodistance-based or faceted search as described in http://sphinxsearch.com/docs/current.html#attributes

2
votes

Mysql has built in full text search which is quite limited in functionality.

I would recommend to use specialized full text search engines, the most simplest and friendly with mysql is sphinx - http://sphinxsearch.com/. It has a lot of libs for any platforms/languages to create and manage indexes.

Separated search engine will allow you to avoid slow inserts on such a big collection because of indexes updates on every insert.