1
votes

I have this table:

 CREATE TABLE `property_ads_history` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `property_ad_id` int(10) unsigned NOT NULL,
 `advertiser_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `agency_reference_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `average_sale_price` double(8,2) NOT NULL DEFAULT '-1.00',
 `bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `bedrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `carports` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_country` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_reason` enum('Geocoded','Sanitized Parking','Sanitized Representation','Sanitized Address','Scraped','URL Inserted','QA Sanitized Address','QA Sanitized Representation','QA Sanitized Parkings') COLLATE utf8_unicode_ci DEFAULT NULL,
 `description` longtext COLLATE utf8_unicode_ci NOT NULL,
 `ensuite_bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_ad_expired_at` datetime NOT NULL,
 `floor_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `formatted_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `garages` double(8,2) NOT NULL DEFAULT '-1.00',
 `geocode_status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `is_represented` tinyint(1) DEFAULT NULL,
 `land_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `latitude` double(10,6) NOT NULL,
 `location_id` int(10) unsigned DEFAULT NULL,
 `longitude` double(10,6) NOT NULL,
 `off_street_parkings` double(8,2) NOT NULL DEFAULT '-1.00',
 `official_property_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `page_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `parking` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `posted_at` datetime NOT NULL,
 `posted_at_string` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `postal_code` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `price` double(10,2) NOT NULL DEFAULT '-1.00',
 `primary_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `DELETE_property_ad_created_at` datetime NOT NULL,
 `property_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `rateable_value` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_1` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_2` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_3` double(10,2) NOT NULL DEFAULT '-1.00',
 `reviewer_comments` longtext COLLATE utf8_unicode_ci NOT NULL,
 `route` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `source_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `street_number` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `user_id` int(10) unsigned DEFAULT NULL,
 `user_provided_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `deleted_at` timestamp NULL DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `unique_index` (`created_at`,`created_reason`,`source_id`),
 KEY `property_ads_history_property_ad_id_foreign` (`property_ad_id`),
 KEY `property_ads_history_location_id_foreign` (`location_id`),
 KEY `created_reason` (`created_reason`)
);

This is my SQL query:

SELECT * FROM `property_ads_history` `t1` 
WHERE `t1`.`created_at` >= '2016-04-13 00:00:00' 
AND `t1`.`created_reason` = 'Scraped' 
AND (`t1`.`price` > -1 OR `t1`.`price` <> 999999.99) 
AND (
  SELECT `t2`.`price` FROM `property_ads_history` `t2` 
  WHERE `t2`.`property_ad_id` = `t1`.`property_ad_id` 
  AND `t2`.`created_at` < `t1`.`created_at` 
  AND (`t2`.`price` > -1 OR `t2`.`price` <> 999999.99) ORDER BY  
  DATE(`t2`.`created_at`) DESC LIMIT 1
) <> `t1`.`price` GROUP BY `t1`.`property_ad_id`;

All I want in that query are:

  1. I want to get the records that meet certain criteria, particularly that the created_at field is within the last 24 hours

  2. I need to get the record that immediately precedes the records in #1

  3. further filter results in #1 to records whose price column has had a different value in a record's history, other than -1

The problem I'm facing is that the query is too slow. I have a million records. It would take more or less 2 mins for the query to execute. I'm guessing GROUP BY makes the query slow. I think I need to make a composite index but I'm not sure how to do it.

Explain result:


id | select_type        | table | type | possible_keys                                             | key                                         | key_len | ref                                | rows   | Extra
1  | PRIMARY            | t1    | ref  | unique_index,created_reason                               | created_reason                              | 2       | const                              | 590030 | Using where; Using temporary; Using filesort
2  | DEPENDENT SUBQUERY | t2    | ref  | unique_index,property_ads_history_property_ad_id_foreign  | property_ads_history_property_ad_id_foreign | 4       | reis_real_estate.t1.property_ad_id | 7      | Using where; Using filesort

1
Note: People often say DOUBLE or FLOAT when they really mean DECIMALStrawberry
Do not use (m,n) on FLOAT or DOUBLE (eg, double(8,2)), it causes an extra rounding, when you probably want no rounding.Rick James
What property has 99 million bathrooms?Rick James

1 Answers

0
votes

Instead of using subquery, use joins it will reduce the query time. below query might help you.

select  t1.*, t2.price
    from  `property_ads_history` AS `t1`
    INNER JOIN `property_ads_history` As `t2`
    where  `t1`.`created_at` >= '2016-04-13 00:00:00'
      AND  `t1`.`created_reason` = 'Scraped'
      AND  (`t1`.`price` > -1 OR  `t1`.`price` <> 999999.99 )
      AND  `t2`.`created_at` < `t1`.`created_at`
      AND  (`t2`.`price` > -1 OR  `t2`.`price` <> 999999.99 )
    ORDER BY  DATE(`t2`.`created_at`) DESC
    GROUP BY  `t1`.`property_ad_id`;

Also, put indexes on your column that will also reduce time of your query.