0
votes

I have a very interesting problem:

I have a MySQL table 'Venue' with fields: 'name', 'addressLine1', 'addressLine2', 'addressLine3', 'city', 'country', 'description'; all fields are VARCHAR. 'description' is a larger text field.

What I would like to do is a fuzzy search on table Venue. So far I am using:

SELECT * FROM Venue WHERE MATCH(name, addressLine1,..., description) AGAINST("London" IN NATURAL LANGUAGE MODE).

I can also sort this query based on the MATCH score.

This is great but has some obvious problems: 1) if the user types "lond" then nothing is returned 2) if the user types "lodnod" then nothing is returned.

One alternative solution I thought of would be to use a Levenshtein expansion (2 movements) - so for the search term "lodnod" the query would be like:

SELECT * FROM Venue WHERE Venue. name LIKE '%__lodnod%'
OR Venue.addressLine1 LIKE '%__lodnod%'
OR Venue.addressLine2 LIKE '%__lodnod%'
OR Venue.addressLine3 LIKE '%__lodnod%'
OR Venue.city LIKE '%__lodnod%'
OR Venue.county LIKE '%__lodnod%'
OR Venue.country LIKE '%__lodnod%'
OR Venue. name LIKE '%lodnod%'
OR Venue.addressLine1 LIKE '%lodnod%'
OR Venue.addressLine2 LIKE '%lodnod%'
OR Venue.addressLine3 LIKE '%lodnod%'
OR Venue.city LIKE '%lodnod%'
OR Venue.county LIKE '%lodnod%'
OR Venue.country LIKE '%lodnod%'
OR Venue. name LIKE '%_lodnod%'
OR Venue.addressLine1 LIKE '%_lodnod%'
OR Venue.addressLine2 LIKE '%_lodnod%'
OR Venue.addressLine3 LIKE '%_lodnod%'
OR Venue.city LIKE '%_lodnod%'
OR Venue.county LIKE '%_lodnod%'
OR Venue.country LIKE '%_lodnod%'
OR Venue. name LIKE '%_odnod%'
OR Venue.addressLine1 LIKE '%_odnod%'
OR Venue.addressLine2 LIKE '%_odnod%'
OR Venue.addressLine3 LIKE '%_odnod%'
OR Venue.city LIKE '%_odnod%'
OR Venue.county LIKE '%_odnod%'
OR Venue.country LIKE '%_odnod%'
OR Venue. name LIKE '%__odnod%'
OR Venue.addressLine1 LIKE '%__odnod%'
OR Venue.addressLine2 LIKE '%__odnod%'
OR Venue.addressLine3 LIKE '%__odnod%'
OR Venue.city LIKE '%__odnod%'
OR Venue.county LIKE '%__odnod%'
OR Venue.country LIKE '%__odnod%'
OR Venue. name LIKE '%_l_odnod%'
OR Venue.addressLine1 LIKE '%_l_odnod%'
OR Venue.addressLine2 LIKE '%_l_odnod%'
OR Venue.addressLine3 LIKE '%_l_odnod%'
OR Venue.city LIKE '%_l_odnod%'
OR Venue.county LIKE '%_l_odnod%'
OR Venue.country LIKE '%_l_odnod%'
OR Venue. name LIKE '%_ldnod%'
OR Venue.addressLine1 LIKE '%_ldnod%'
OR Venue.addressLine2 LIKE '%_ldnod%'
OR Venue.addressLine3 LIKE '%_ldnod%'
OR Venue.city LIKE '%_ldnod%'
OR Venue.county LIKE '%_ldnod%'
OR Venue.country LIKE '%_ldnod%'
OR Venue. name LIKE '%_l_dnod%'
OR Venue.addressLine1 LIKE '%_l_dnod%'
OR Venue.addressLine2 LIKE '%_l_dnod%'
OR Venue.addressLine3 LIKE '%_l_dnod%'
OR Venue.city LIKE '%_l_dnod%'
OR Venue.county LIKE '%_l_dnod%'
OR Venue.country LIKE '%_l_dnod%'
OR Venue. name LIKE '%_lo_dnod%'
OR Venue.addressLine1 LIKE '%_lo_dnod%'
OR Venue.addressLine2 LIKE '%_lo_dnod%'
OR Venue.addressLine3 LIKE '%_lo_dnod%'
OR Venue.city LIKE '%_lo_dnod%'
OR Venue.county LIKE '%_lo_dnod%'
OR Venue.country LIKE '%_lo_dnod%'
OR Venue. name LIKE '%_lonod%'
OR Venue.addressLine1 LIKE '%_lonod%'
OR Venue.addressLine2 LIKE '%_lonod%'
OR Venue.addressLine3 LIKE '%_lonod%'
OR Venue.city LIKE '%_lonod%'
OR Venue.county LIKE '%_lonod%'
OR Venue.country LIKE '%_lonod%'
OR Venue. name LIKE '%_lo_nod%'
OR Venue.addressLine1 LIKE '%_lo_nod%'
OR Venue.addressLine2 LIKE '%_lo_nod%'
OR Venue.addressLine3 LIKE '%_lo_nod%'
OR Venue.city LIKE '%_lo_nod%'
OR Venue.county LIKE '%_lo_nod%'
OR Venue.country LIKE '%_lo_nod%'
OR Venue. name LIKE '%_lod_nod%'
OR Venue.addressLine1 LIKE '%_lod_nod%'
OR Venue.addressLine2 LIKE '%_lod_nod%'
OR Venue.addressLine3 LIKE '%_lod_nod%'
OR Venue.city LIKE '%_lod_nod%'
OR Venue.county LIKE '%_lod_nod%'
OR Venue.country LIKE '%_lod_nod%'
OR Venue. name LIKE '%_lodod%'
OR Venue.addressLine1 LIKE '%_lodod%'
OR Venue.addressLine2 LIKE '%_lodod%'
OR Venue.addressLine3 LIKE '%_lodod%'
OR Venue.city LIKE '%_lodod%'
OR Venue.county LIKE '%_lodod%'
OR Venue.country LIKE '%_lodod%'
OR Venue. name LIKE '%_lod_od%'
OR Venue.addressLine1 LIKE '%_lod_od%'
OR Venue.addressLine2 LIKE '%_lod_od%'
OR Venue.addressLine3 LIKE '%_lod_od%'
OR Venue.city LIKE '%_lod_od%'
OR Venue.county LIKE '%_lod_od%'
OR Venue.country LIKE '%_lod_od%'
OR Venue. name LIKE '%_lodn_od%'
OR Venue.addressLine1 LIKE '%_lodn_od%'
OR Venue.addressLine2 LIKE '%_lodn_od%'
OR Venue.addressLine3 LIKE '%_lodn_od%'
OR Venue.city LIKE '%_lodn_od%'
OR Venue.county LIKE '%_lodn_od%'
OR Venue.country LIKE '%_lodn_od%'
OR Venue. name LIKE '%_lodnd%'
OR Venue.addressLine1 LIKE '%_lodnd%'
OR Venue.addressLine2 LIKE '%_lodnd%'
OR Venue.addressLine3 LIKE '%_lodnd%'
OR Venue.city LIKE '%_lodnd%'
OR Venue.county LIKE '%_lodnd%'
OR Venue.country LIKE '%_lodnd%'
OR Venue. name LIKE '%_lodn_d%'
OR Venue.addressLine1 LIKE '%_lodn_d%'
OR Venue.addressLine2 LIKE '%_lodn_d%'
OR Venue.addressLine3 LIKE '%_lodn_d%'
OR Venue.city LIKE '%_lodn_d%'
OR Venue.county LIKE '%_lodn_d%'
OR Venue.country LIKE '%_lodn_d%'
OR Venue. name LIKE '%_lodno_d%'
OR Venue.addressLine1 LIKE '%_lodno_d%'
OR Venue.addressLine2 LIKE '%_lodno_d%'
OR Venue.addressLine3 LIKE '%_lodno_d%'
OR Venue.city LIKE '%_lodno_d%'
OR Venue.county LIKE '%_lodno_d%'
OR Venue.country LIKE '%_lodno_d%'
OR Venue. name LIKE '%_lodno%'
OR Venue.addressLine1 LIKE '%_lodno%'
OR Venue.addressLine2 LIKE '%_lodno%'
OR Venue.addressLine3 LIKE '%_lodno%'
OR Venue.city LIKE '%_lodno%'
OR Venue.county LIKE '%_lodno%'
OR Venue.country LIKE '%_lodno%'
OR Venue. name LIKE '%_lodno_%'
OR Venue.addressLine1 LIKE '%_lodno_%'
OR Venue.addressLine2 LIKE '%_lodno_%'
OR Venue.addressLine3 LIKE '%_lodno_%'
OR Venue.city LIKE '%_lodno_%'
OR Venue.county LIKE '%_lodno_%'
OR Venue.country LIKE '%_lodno_%'
OR Venue. name LIKE '%_lodnod_%'
OR Venue.addressLine1 LIKE '%_lodnod_%'
OR Venue.addressLine2 LIKE '%_lodnod_%'
OR Venue.addressLine3 LIKE '%_lodnod_%'
OR Venue.city LIKE '%_lodnod_%'
OR Venue.county LIKE '%_lodnod_%'
OR Venue.country LIKE '%_lodnod_%'
OR Venue. name LIKE '%dnod%'
OR Venue.addressLine1 LIKE '%dnod%'
OR Venue.addressLine2 LIKE '%dnod%'
OR Venue.addressLine3 LIKE '%dnod%'
OR Venue.city LIKE '%dnod%'
OR Venue.county LIKE '%dnod%'
OR Venue.country LIKE '%dnod%'
OR Venue. name LIKE '%_dnod%'
OR Venue.addressLine1 LIKE '%_dnod%'
OR Venue.addressLine2 LIKE '%_dnod%'
OR Venue.addressLine3 LIKE '%_dnod%'
OR Venue.city LIKE '%_dnod%'
OR Venue.county LIKE '%_dnod%'
OR Venue.country LIKE '%_dnod%'
OR Venue. name LIKE '%o_dnod%'
OR Venue.addressLine1 LIKE '%o_dnod%'
OR Venue.addressLine2 LIKE '%o_dnod%'
OR Venue.addressLine3 LIKE '%o_dnod%'
OR Venue.city LIKE '%o_dnod%'
OR Venue.county LIKE '%o_dnod%'
OR Venue.country LIKE '%o_dnod%'
OR Venue. name LIKE '%onod%'
OR Venue.addressLine1 LIKE '%onod%'
OR Venue.addressLine2 LIKE '%onod%'
OR Venue.addressLine3 LIKE '%onod%'
OR Venue.city LIKE '%onod%'
OR Venue.county LIKE '%onod%'
OR Venue.country LIKE '%onod%'
OR Venue. name LIKE '%o_nod%'
OR Venue.addressLine1 LIKE '%o_nod%'
OR Venue.addressLine2 LIKE '%o_nod%'
OR Venue.addressLine3 LIKE '%o_nod%'
OR Venue.city LIKE '%o_nod%'
OR Venue.county LIKE '%o_nod%'
OR Venue.country LIKE '%o_nod%'
OR Venue. name LIKE '%od_nod%'
OR Venue.addressLine1 LIKE '%od_nod%'
OR Venue.addressLine2 LIKE '%od_nod%'
OR Venue.addressLine3 LIKE '%od_nod%'
OR Venue.city LIKE '%od_nod%'
OR Venue.county LIKE '%od_nod%'
OR Venue.country LIKE '%od_nod%'
OR Venue. name LIKE '%odod%'
OR Venue.addressLine1 LIKE '%odod%'
OR Venue.addressLine2 LIKE '%odod%'
OR Venue.addressLine3 LIKE '%odod%'
OR Venue.city LIKE '%odod%'
OR Venue.county LIKE '%odod%'
OR Venue.country LIKE '%odod%'
OR Venue. name LIKE '%od_od%'
OR Venue.addressLine1 LIKE '%od_od%'
OR Venue.addressLine2 LIKE '%od_od%'
OR Venue.addressLine3 LIKE '%od_od%'
OR Venue.city LIKE '%od_od%'
OR Venue.county LIKE '%od_od%'
OR Venue.country LIKE '%od_od%'
OR Venue. name LIKE '%odn_od%'
OR Venue.addressLine1 LIKE '%odn_od%'
OR Venue.addressLine2 LIKE '%odn_od%'
OR Venue.addressLine3 LIKE '%odn_od%'
OR Venue.city LIKE '%odn_od%'
OR Venue.county LIKE '%odn_od%'
OR Venue.country LIKE '%odn_od%'
OR Venue. name LIKE '%odnd%'
OR Venue.addressLine1 LIKE '%odnd%'
OR Venue.addressLine2 LIKE '%odnd%'
OR Venue.addressLine3 LIKE '%odnd%'
OR Venue.city LIKE '%odnd%'
OR Venue.county LIKE '%odnd%'
OR Venue.country LIKE '%odnd%'
OR Venue. name LIKE '%odn_d%'
OR Venue.addressLine1 LIKE '%odn_d%'
OR Venue.addressLine2 LIKE '%odn_d%'
...(cut short because of maximum 30000 character limit)
OR Venue.city LIKE '%lodno_%'
OR Venue.county LIKE '%lodno_%'
OR Venue.country LIKE '%lodno_%'
OR Venue. name LIKE '%lodno__%'
OR Venue.addressLine1 LIKE '%lodno__%'
OR Venue.addressLine2 LIKE '%lodno__%'
OR Venue.addressLine3 LIKE '%lodno__%'
OR Venue.city LIKE '%lodno__%'
OR Venue.county LIKE '%lodno__%'
OR Venue.country LIKE '%lodno__%'
OR Venue. name LIKE '%lodno_d_%'
OR Venue.addressLine1 LIKE '%lodno_d_%'
OR Venue.addressLine2 LIKE '%lodno_d_%'
OR Venue.addressLine3 LIKE '%lodno_d_%'
OR Venue.city LIKE '%lodno_d_%'
OR Venue.county LIKE '%lodno_d_%'
OR Venue.country LIKE '%lodno_d_%'
OR Venue. name LIKE '%lodno%'
OR Venue.addressLine1 LIKE '%lodno%'
OR Venue.addressLine2 LIKE '%lodno%'
OR Venue.addressLine3 LIKE '%lodno%'
OR Venue.city LIKE '%lodno%'
OR Venue.county LIKE '%lodno%'
OR Venue.country LIKE '%lodno%'
OR Venue. name LIKE '%lodnod__%'
OR Venue.addressLine1 LIKE '%lodnod__%'
OR Venue.addressLine2 LIKE '%lodnod__%'
OR Venue.addressLine3 LIKE '%lodnod__%'
OR Venue.city LIKE '%lodnod__%'
OR Venue.county LIKE '%lodnod__%'
OR Venue.country LIKE '%lodnod__%'
OR Venue. name LIKE '%lodnod_%'
OR Venue.addressLine1 LIKE '%lodnod_%'
OR Venue.addressLine2 LIKE '%lodnod_%'
OR Venue.addressLine3 LIKE '%lodnod_%'
OR Venue.city LIKE '%lodnod_%'
OR Venue.county LIKE '%lodnod_%'
OR Venue.country LIKE '%lodnod_%'

Obviously, this query is 1) going to be a huge load on the server and response times will be affected. 2) I have no idea on scoring this and therefore sorting it.

Is there a way to do this better?

2

2 Answers

4
votes

MySQL's documentation page for full text search functions has some useful information. Despite what the comments above say, you don't need to restrict yourself to the MyISAM table type in MySQL. As others have said, MySQL may not be the best choice, but there are some things you can do to make it more workable.

Column Aggregation

If you find that you're always doing searches like this against multiple columns, it might help to have a "searchable" column that contains just the tokens from all of the other columns. You'd need some mechanism to keep it up to date, but ON INSERT and ON UPDATE triggers would be sufficient. This may simplify your indexes and make your queries easier to read (and hopefully, speedup responses), but doesn't solve your initial issue.

One advantage of this "searchable" column idea is you can quietly add tokens of common misspellings to make things easier for your users. Your search logs are helpful here.

Consider Soundex instead

One solution that's had some recent success is using soundex instead of Levenshtein expansion. See, e.g., Rob Gravelle (2015) "MySQL Fuzzy Text Searching Using the SOUNDEX Function, Database Journal

User Defined Function (UDF)

Another option is adding a User Defined Function (UDF) to your MySQL implementation. See e.g., this Stack Overflow question or this blog post.

However you approach it, you'll find that truly fuzzy searching with Levenshtein is a slow beast. Best of luck to you!

0
votes

Don't use mysql for this stuff. You need an engine optimized for indexing search material. You can use elasticsearch or solr for this purpose and index it using mysql. Mysql is simply not designed for this purpose