1
votes

I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.

My situation is I have 3 tables:

/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */

    CREATE TABLE `fans` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `first_name` varchar(255) DEFAULT NULL,
      `middle_name` varchar(255) DEFAULT NULL,
      `last_name` varchar(255) DEFAULT NULL,
      `email` varchar(255) DEFAULT NULL,
      `join_date` datetime DEFAULT NULL,
      `twitter` varchar(255) DEFAULT NULL,
      `twitterCrawled` datetime DEFAULT NULL,
      `twitterImage` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `email` (`email`)
    ) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;

    /* A TABLE OF OUR TWITTER FOLLOWERS */

    CREATE TABLE `twitterFollowers` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `screenName` varchar(25) DEFAULT NULL,
      `twitterId` varchar(25) DEFAULT NULL,
      `customerId` int(11) DEFAULT NULL,
      `uniqueStr` varchar(50) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unique` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;

    /* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS 
    IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */

    CREATE TABLE `contentSuggestion` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `userId` int(11) DEFAULT NULL,
      `fanId` int(11) DEFAULT NULL,
      `twitterAccountId` int(11) DEFAULT NULL,
      `contentType` varchar(50) DEFAULT NULL,
      `contentString` varchar(255) DEFAULT NULL,
      `added` datetime DEFAULT NULL,
      `dismissed` datetime DEFAULT NULL,
      `uniqueStr` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unstr` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

What I'm trying to get is:

SELECT [fan columns] WHERE fan screen name IS IN twitterfollowers AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)

My attempts so far:

~33 seconds

SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL GROUP BY(fans.id) HAVING col1 != ''

~14 seconds

SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);

9.53 seconds

SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)

I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.

Thanks for any help you can give me.

1

1 Answers

0
votes

I would go with a variation of the second method. Instead of IN, use EXISTS. Then add the correct indexes and remove the aggregation:

SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
              FROM twitterFollowers tf
              WHERE f.emailUsername = tf.screenName
             ) AND
      NOT EXISTS (SELECT 1
                  FROM contentSuggestion cs
                  WHERE f.emailUsername = cs.contentString AND
                        cs.dismissed IS NULL
                 ) ;

Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).

Some notes:

  • When using IN, don't use SELECT DISTINCT. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT in the subquery (it is redundant).
  • Historically, EXISTS was faster than IN in MySQL. The optimizer has improved in recent versions.
  • For performance, you need the correct indexes. Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
  • Assuming that fan.id is unique (a very reasonable assumption), you don't need the final group by.