1
votes

I need to use Lucene.NET for a full text search on a mysql database. I must use InnoDB and so the Lucene is my selected search provider.

There are 2 fields that need to be indexed they are varchar(200) and varchar(2000). But the issue is, it should be possible to full text search a record as soon as it is inserted. So I have to insert the record to the lucene index and to the db at the same time.

So is it feasible to do that and how long a single insert take ? This is for a web app and I expect 500 inserts per second.

If you've done that before let me know your techniques. If this is not feasible my only option will be to move back to SQLServer to use its full text search. :(

P.S.

I am not worried about the database insert performance. But what I don't know is the time that the Lucene will take to update it s index.

3
( 200 + 2000 ) * 500 * 60 * 60 * 24 / 1024 / 1024 / 1024 = 88 GB of data inserted per day. Will there be always 500 inserts/s, or is it a peak ? - mathieu
it is the peak, I need the system handle that many inserts without braking down. 200 and 2000 are the max limits (the fields are title and text of a blog post) so they won't be that long in every record. actually 500/ sec might be very rare but I need to know if its possible for lucene to handle it - Amila
Have you considered using SOLR instead. That way you have a dedicated search daemon running. lucene.apache.org/solr - danielrsmith

3 Answers

2
votes

Yes it is feasible, using Lucene's NearRealtimeSearch feature.

http://wiki.apache.org/lucene-java/NearRealtimeSearch

But with the kind of volume you want to handle, you will have to cleverly manage index optimizations/merges when you commit if you want to maintain decent search performances.

1
votes

Lucene.Net or SQLServer, unless you find a smart way, you are in trouble with 43,200,000 docs/day.

0
votes

Some guidelines here :

  • With that much data inserted at peak times, does it make sense to have it available in the index right away ? Couldn't a scheduled task do the trick ?
  • What sort of searching are you doing on those fields ? If it is really simple, isn't lucene overkill ?
  • You told us about the write volumes, but what about reads ? It looks like there will be quite a lot of data, but if it is searched not intensively, isn't a select id from table where col like '%search%' sufficient ?

You should really get a proof of concept with different indexing strategies, and stress test it to find which one fits best.