4
votes

I have an application which needs to do filtering and retrieve results from a list of Articles. For database I am using MySQL, with NHibernate as the ORM. The query also does a full-text search based on keywords, and for this it uses Lucene.Net.

My problem is that the query spans 'both' domains. For example, I might need to get all articles that contain the keywords 'traffic control', and have PublishedOn < 2012-10-01. Also, the query uses pagination, for example page #2, with a page size of 50. The problem is how can I create a query which spans both MySQL (for the PublishedOn part), and Lucene.Net to harness the full-text search capability.

If I do a search on MySQL first, I cannot just get the first 50, because the results might be further filtered in Lucene and I need 50 as my page size. The same goes if I start with Lucene.Net. Also, preferably the ordering is by 'relevance', so this is something which Lucene can do, not MySQL.

My current approach has been to first filter by MySQL, and retrieve ALL the primary keys of matched records. Then, I do a query in Lucene, with a term-query matching the primary key to the list of results. However, Lucene is very slow for such a query, and the database can contain over 200,000 articles. Doing such a query takes ages in Lucene, while it is blazingly fast for full-text searches.

Any ideas how one can go about addressing such an issue?

1

1 Answers

0
votes

Lucene isn't only about full text search. You can add PublishedOn property to a Lucene document and perform query like this:

Text:"traffic control" AND PublishedOn:[00000000 TO 20121001]

Check out the "Range Searches" section in Lucene syntax documentation.