0
votes

I have few queries and am not able to figure out how to optimize them,

QUERY 1

select * 
from t_twitter_tracking 
where classified is null and tweetType='ENGLISH' 
order by id limit 500;

QUERY 2

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;

Both these queries run on the same table,

CREATE TABLE `t_twitter_tracking` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`word` VARCHAR(200) NOT NULL,
`tweetId` BIGINT(100) NOT NULL,
`twtText` VARCHAR(800) NULL DEFAULT NULL,
`language` TEXT NULL,
`links` TEXT NULL,
`tweetType` VARCHAR(20) NULL DEFAULT NULL,
`source` TEXT NULL,
`sourceStripped` TEXT NULL,
`isTruncated` VARCHAR(40) NULL DEFAULT NULL,
`inReplyToStatusId` BIGINT(30) NULL DEFAULT NULL,
`inReplyToUserId` INT(11) NULL DEFAULT NULL,
`rtUsrProfilePicUrl` TEXT NULL,
`isFavorited` VARCHAR(40) NULL DEFAULT NULL,
`inReplyToScreenName` VARCHAR(40) NULL DEFAULT NULL,
`latitude` BIGINT(100) NOT NULL,
`longitude` BIGINT(100) NOT NULL,
`retweetedStatus` VARCHAR(40) NULL DEFAULT NULL,
`statusInReplyToStatusId` BIGINT(100) NOT NULL,
`statusInReplyToUserId` BIGINT(100) NOT NULL,
`statusFavorited` VARCHAR(40) NULL DEFAULT NULL,
`statusInReplyToScreenName` TEXT NULL,
`screenName` TEXT NULL,
`profilePicUrl` TEXT NULL,
`twitterId` BIGINT(100) NOT NULL,
`name` TEXT NULL,
`location` VARCHAR(100) NULL DEFAULT NULL,
`bio` TEXT NULL,
`url` TEXT NULL COLLATE 'latin1_swedish_ci',
`utcOffset` INT(11) NULL DEFAULT NULL,
`timeZone` VARCHAR(100) NULL DEFAULT NULL,
`frenCnt` BIGINT(20) NULL DEFAULT '0',
`createdAt` DATETIME NULL DEFAULT NULL,
`createdOnGMT` VARCHAR(40) NULL DEFAULT NULL,
`createdOnServerTime` DATETIME NULL DEFAULT NULL,
`follCnt` BIGINT(20) NULL DEFAULT '0',
`favCnt` BIGINT(20) NULL DEFAULT '0',
`totStatusCnt` BIGINT(20) NULL DEFAULT NULL,
`usrCrtDate` VARCHAR(200) NULL DEFAULT NULL,
`humanSentiment` VARCHAR(30) NULL DEFAULT NULL,
`replied` BIT(1) NULL DEFAULT NULL,
`replyMsg` TEXT NULL,
`classified` INT(32) NULL DEFAULT NULL,
`createdOnGMTDate` DATETIME NULL DEFAULT NULL,
`locationDetail` TEXT NULL,
`geonameid` INT(11) NULL DEFAULT NULL,
`country` VARCHAR(255) NULL DEFAULT NULL,
`continent` CHAR(2) NULL DEFAULT NULL,
`placeLongitude` FLOAT NULL DEFAULT NULL,
`placeLatitude` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `id` (`id`, `word`),
INDEX `createdOnGMT_index` (`createdOnGMT`) USING BTREE,
INDEX `word_index` (`word`) USING BTREE,
INDEX `location_index` (`location`) USING BTREE,
INDEX `classified_index` (`classified`) USING BTREE,
INDEX `tweetType_index` (`tweetType`) USING BTREE,
INDEX `getunclassified_index` (`classified`, `tweetType`) USING BTREE,
INDEX `timeline_index` (`word`, `createdOnGMTDate`, `classified`) USING BTREE,
INDEX `createdOnGMTDate_index` (`createdOnGMTDate`) USING BTREE,
INDEX `locdetail_index` (`country`, `id`) USING BTREE,
FULLTEXT INDEX `twtText_index` (`twtText`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=12608048;

The table has more than 10 million records. How can I optimize it?

EDITED

Explain on 2nd query

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra" "1";"SIMPLE";"wrdTrk";"range";"word_index,word_createdOnGMT";"word_index";"602";NULL;"222847";"Using where; Using temporary; Using filesort"

Regards, Rohit

2
Could you run EXPLAIN on both queries and post the output?Richard Simões
you could start by learning about datatypes bigint(100) - zomgJon Black
f00.. thanks for the suggestion. Will have a look .sesmic

2 Answers

1
votes

In Query2, I suggest that:
1. remove DATE_FORMAT and CONVERT_TZ. You can process in PHP to be an output or between's condition.
2. like ('dell'): I don't see any '%', so you can use wrdTrk.word = 'dell' to let it faster.

0
votes

The convert_tz in the where condition needs to be removed,

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;

This will lead to comparing each row and finding out the right result, hence a tremendous improvement in the query result. Just passed the converted data to the query.

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;