I have names of all the employees of my company (5000+). I want to write an engine which can on the fly find names in online articles(blogs/wikis/help documents) and tag them with "mailto" tag with the users email.
As of now I am planning to remove all the stop words from the article and then search for each word in a lucene index. But even in that case I see a lot of queries hitting the indexes, for example if there is an article with 2000 words and only two references to people names then most probably there will be 1000 lucene queries.
Is there a way to reduce these queries? Or a completely other way of achieving the same? Thanks in advance