0
votes

We have many objects and each objects comes with around 100-200 words description. (for example a book's author name and small summary).

User gives input as series for words. How to implement search with approximate text and minor spelling changes? for example "Joshua Bloch", "Joshua blosh", joshua block" could lead to same text result.

6
I don't know much about spell checking but i have heard bloom filter's are useful in such cases.Check the link. ipowerinfinity.wordpress.com/2008/03/02/…Emil

6 Answers

1
votes

If you are using Lucene for your full-text search, there is a "Did you mean" extension for is probably what you want.

1
votes

How to implement search with approximate text and minor spelling changes? for example "Joshua Bloch", "Joshua blosh", joshua block" could lead to same text result.

Does your database support Soundex? Soundex will match similar sounding words which seems to fit the example you gave above. Even if your database doesn't have native soundex you can still write an implementation and save the soundex for each author name in a separate field. This can be used to match later.

However Soundex is not a replacement for full text search; it will only help in specific cases likle author name. If you are looking to find some specific text from say, the book's blurb then you are better off with a full text search option (like Postgresql's).

1
votes

If you are looking for actual implementation of this feature, here is a brilliant program written by Peter Norvig: http://norvig.com/spell-correct.html

It also has links to implementations in many other languages including Java, C etc.

1
votes

You can use the spell checker JOrtho. From the context in your database you can generate a custom dictionary and set it. Then all words that are not in the dictionary and not in your database are mark as wrong spelling.

1
votes

Instead of Lucene, please check Solr. Lucene is a library which you can use to embed search function in your application. Solr is the actual implementation of Lucene which you can directly plug in to your application via APIs. For most systems, Solr will save dealing with complexity of Lucene.

0
votes

Apache Lucene may fit your bill. It is high performance, full test search engine library written entirely in Java.