Tagging of names using lucene/java

Question

I have names of all the employees of my company (5000+). I want to write an engine which can on the fly find names in online articles(blogs/wikis/help documents) and tag them with "mailto" tag with the users email.

As of now I am planning to remove all the stop words from the article and then search for each word in a lucene index. But even in that case I see a lot of queries hitting the indexes, for example if there is an article with 2000 words and only two references to people names then most probably there will be 1000 lucene queries.

Is there a way to reduce these queries? Or a completely other way of achieving the same? Thanks in advance

I am not sure I am following, isn't the list of employees pre-defined? aren't these names your queries? — amit
@amit list of employees is 5000, are you asking if I should search for each name in the article? 5000 queries in 2000 word document? I was wondering other way around. — Sap
you have only one document? if you do, lucene won't help you much.. — amit
@amit nope i have lot's of documents, I am using one doc as example. But I want to do this on the fly. This means that when a user is typing his wiki in the preview area it should on the fly mark a name with email address — Sap
If I understand correctly, what you'd like to do is search your list of names for terms that people type, so that you can offer them suggestions of email address, etc. when the text they typed is a name of a person in your collection. Is that correct? — Gene Golovchinsky

Gene Golovchinsky Gene Golovchinsky · Accepted Answer · 2011-09-01T07:14:58

If you have only 5000 names, I would just stick them into a hash table in memory instead of bothering with Lucene. You can hash them several ways (e.g., nicknames, first-last or last-first, etc.) and still have a relatively small memory footprint and really efficient performance.

Tagging of names using lucene/java

2 Answers