4
votes

I have a mysql (5.0.22) myisam table with roughly 300k records in it and I want to do a lat/lon distance search within a five mile radius.

I have an index that covers the lat/lon fields and is fast (milisecond response) when I just select for lat/lon. But when I select for additional fields in the table is slows down horribly to 5-8 seconds.

I'm using myisam to take advantage of fulltext search. The other indexes perform well (e.g. select * from Listing where slug = 'xxxxx').

How can I optimize my query, table or index to speed things up?

My schema is:

CREATE TABLE  `Listing` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(125) collate utf8_unicode_ci default NULL,
  `phone` varchar(18) collate utf8_unicode_ci default NULL,
  `fax` varchar(18) collate utf8_unicode_ci default NULL,
  `email` varchar(55) collate utf8_unicode_ci default NULL,
  `photourl` varchar(55) collate utf8_unicode_ci default NULL,
  `thumburl` varchar(5) collate utf8_unicode_ci default NULL,
  `website` varchar(85) collate utf8_unicode_ci default NULL,
  `categoryid` int(10) unsigned default NULL,
  `addressid` int(10) unsigned default NULL,
  `deleted` tinyint(1) default NULL,
  `status` int(10) unsigned default '2',
  `parentid` int(10) unsigned default NULL,
  `organizationid` int(10) unsigned default NULL,
  `listinginfoid` int(10) unsigned default NULL,
  `createuserid` int(10) unsigned default NULL,
  `createdate` datetime default NULL,
  `lasteditdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `lastedituserid` int(10) unsigned default NULL,
  `slug` varchar(155) collate utf8_unicode_ci default NULL,
  `aclid` int(10) unsigned default NULL,
  `alt_address` varchar(80) collate utf8_unicode_ci default NULL,
  `alt_website` varchar(80) collate utf8_unicode_ci default NULL,
  `lat` decimal(10,7) default NULL,
  `lon` decimal(10,7) default NULL,
  `city` varchar(80) collate utf8_unicode_ci default NULL,
  `state` varchar(10) collate utf8_unicode_ci default NULL,
  PRIMARY KEY  (`id`),
  KEY `idx_fetch` USING BTREE (`slug`,`deleted`),
  KEY `idx_loc` (`state`,`city`),
  KEY `idx_org` (`organizationid`,`status`,`deleted`),
  KEY `idx_geo_latlon` USING BTREE (`status`,`lat`,`lon`),
  FULLTEXT KEY `idx_name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=DYNAMIC;

My query is:

SELECT Listing.name, Listing.categoryid, Listing.lat, Listing.lon
, 3956 * 2 * ASIN(SQRT( POWER(SIN((Listing.lat - 37.369195) * pi()/180 / 2), 2) + COS(Listing.lat * pi()/180) * COS(37.369195 * pi()/180) * POWER(SIN((Listing.lon --122.036849) * pi()/180 / 2), 2) )) rawgeosearchdistance
FROM Listing
WHERE
    Listing.status = '2'
    AND ( Listing.lon between -122.10913433498 and -121.96456366502 )
    AND ( Listing.lat between 37.296909665016 and 37.441480334984)
HAVING rawgeosearchdistance < 5
ORDER BY rawgeosearchdistance ASC;

Explain plan without geosearch:

    +----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+
    | id | select_type | table      | type  | possible_keys   | key             | key_len |ref | rows | Extra       |
    +----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+
    |  1 | SIMPLE      | Listing    | range | idx_geo_latlon  | idx_geo_latlon  | 19      | NULL |  453 | Using where |
    +----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-------------+

Explain plan with geosearch:

+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+
| id | select_type | table      | type  | possible_keys   | key             | key_len | ref  | rows | Extra                       |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | Listing    | range | idx_geo_latlon  | idx_geo_latlon  | 19      | NULL |  453 | Using where; Using filesort |
+----+-------------+------------+-------+-----------------+-----------------+---------+------+------+-----------------------------+

Here's the explain plan with the covering index. Having the columns in the correct order made a big difference:

+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+
| id | select_type | table  | type  | possible_keys | key           | key_len | ref  | rows   | Extra                                    |
+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+
|  1 | SIMPLE      | Listing | range | idx_geo_cover | idx_geo_cover | 12      | NULL | 453     | Using where; Using index; Using filesort |
+----+-------------+--------+-------+---------------+---------------+---------+------+--------+------------------------------------------+

Thank you!

5
Post the explain plan for the fast and slow queries.jonstjohn
Looks like you have way too many columns in that one table. You can squeeze some performance out of your queries by normalizing your data structure a bit. :)tom

5 Answers

1
votes

You are probably using a 'covering index' in your lat/lon only query. A covering index occurs when the index used by the query contains the data that you are selecting for. MySQL only needs to visit the index and never the data rows. See this for more info. That would explain why the lat/lon query is so fast.

I suspect that the calculations and the sheer number of rows returned, slows down the longer query. (plus any temp table that has to be created for the having clause).

4
votes

I think you really should consider the use of PostgreSQL (combined with Postgis).

I have given up on MySQL for geospatial data (for now) because of the following reasons:

  • MySQL only supports spatial datatypes / spatial indexes on MyISAM tables with the inherent disadvantages of MyISAM (concerning transactions, referential integrity...)
  • MySQL implements some of the OpenGIS specifications only on a MBR-basis (minimum bounding rectangle) which is pretty useless for most serious geospatial querying-processing (see this link in the MySQL manual). Chances are you will need some of this functionality sooner of later.

PostgreSQL/Postgis with proper (GIST) spatial indexes and proper queries can be extremely fast.

Example: determining overlapping polygons between a 'small' selection of polygons and a table with over 5 million (!) very complex polygons, calculate the amount of overlap between these results + sort. Average runtime: between 30 and 100 milliseconds (This particular machine has a lot of RAM off course. Don't forget to tune your PostgreSQL install... (read the docs)).

0
votes

Depending on the number of your listings could you create a view that contains

Listing1Id, Listing2ID, Distance

Basically just have all of the distances "pre-calculated"

Then you could do something like:

Select listing2ID from v_Distance d where distance < 5 and listing1ID = XXX

0
votes

You really should avoid doing that much math in your select statement. That's probably the source of a lot of your slowdowns. Remember, SQL is a query language; it's really not optimized for trigonometric functions.

SQL will be faster and your overall results will be faster if you do a very naive distance search (which will return more results) and then winnow your results.

If you want to be using distance in your query, at the very least, use a squared distance calculation; sqrt calculations are notoriously slow. Squared distance is much easier to use. A squared distance calculation is simply using the square of the distance instead of the distance; it is much simpler. For cartesian coordinate systems, since the sum of the squares of the short sides of a right triangle equals the square of the hypotenuse, it's easier to calculate the square distance (just sum the two squares) than it is to calculate the distance; all you have to do is make sure that you're squaring the distance you want to compare to (so instead of finding the precise distance and comparing that to your desired distance (let's say 5), you find the square distance, and compare that to the square of the desired distance (25, if your desired distance was 5).

0
votes

When I implemented geo radius search I just loaded all of the us Zipcodes into memory with their lat long and then used my starting point with radius to get a list of zipcodes in the radius and then used that for my db query. Of course I was using solr to do my searching because the search space was in the 20 million row range but the same principles should apply. Apologies for the shallowness of this response as I'm on my phone.