I'm trying to decide on an open source search/indexing technology for a .Net project. It seems like the standard out there for Java projects is Lucene, but as far as .Net is concerned, the Lucene.Net project seems to be pretty inactive. Is this still the best option out there? Or are there other viable alternatives?
11 Answers
While they were no 'full blown' releases (i.e. full documentation, web site updates) of Lucene.Net for quite some time, there are still fresh commits to its SVN repository. The latest release (2.3.2) for example was tagged in 07/24/09 (see here). Since the development is still active I would use it for new full-text-search projects.
I know this isn't open-source, but it is a free and very comprehensive offering from Microsoft:
Microsoft Search Server 2008 Express
Out-of-the-box relevancy.
Localized interface.
Extensible search experience.
No preset document limits.
Continuous propagation indexing.
Out-of-the-box indexing connectors
Content summaries.
Hit highlighting.
Best bets and definitions.
Query correction.
Duplicate collapsing.
Filter by property.
Filter by language.
Sort by date.
E-mail/RSS alerts
lucene.net will necessarily lag the java one since it is a port. I also don't like how the lucene port is a straight copy although it does make it easier on the docs I suppose. Something to consider is using Solr if you don't need super tight (binary) integration. I have used it before with good success. It is still powered by Lucene but I think it is better since it has some better features. You can use it from .net via an HTTP endpoint.
One question to ask yourself is what you really need/want in a search solution. There are a lot of ways to go about implementing search and not all solutions work for every situation.
SQLite has FTS3 (Full Text Search 3) that may do what you want it to do. I don't have direct experience with it, but I believe it was developed explicitly to do what Lucene does, at least in the simple case. I don't believe you can alter the tokenizer or anything (without modifying source code, anyway), but it's an option.
After having used Lucene.Net in a couple projects, I'd also like to add the suggestion of compiling the Java version of lucene into .net code with IKVM.NET. It works wonderfully, and you never have to worry about being out-of-date with respect to the Java version. You also have the option of compiling all the extra libraries and using them as well (I'm using the GIS search stuff in one project).
As I understand, you need "just" a full-text index on your existing database, and SQL Server full-text search in principle worked for you, but your current implementation/setup is too slow.
If I were you, I wouldn't go for a completely different approach (just think about the mess to keep an external index in sync with your database, or join query results from both etc.). Try to fix the performance issue with SQL Server, as nobody would seriously assume that 6sec for searching 7k rows is the final word for a enterprise class solution that is used for some of the largest databases around... Maybe try to ask a new question about common pitfalls with this feature (I'm not an expert on this), and you might end up with a simple fix instead of a complete rebuild of your search architecture ;)
Have a look at www.searcharoo.net. It has a crawler, and features like work stemming, indexing office documents/PDFs. The author is very active on the codeproject articles and responds to questions pretty quickly.
I used to use DotLucene but ran into a number of problems. a major one was the fact that it required full trust to run.
I have since moved to using SearchAroo: http://www.searcharoo.net/
it uses an XML data store, and i have found its performance to be VERY similar to dot lucene.
if you are looking for another option, i'd definitely take a look.