Before asking this question it is important to understand what it is I am actually doing.
The best comparison to the feature I am implementing would be Facebook's search feature. When you begin typing a drop down list appears with various search results. At the top you will find your friends whose names match your search, then other people who match, then pages, events etc....
My situation is similar however I only want to search for two things. users and documents (named ripples in the code below).
I have this working fine. Please bear with me while I talk through the logic of this feature in my case:
- User focuses on search input.
- Ajax request retrieves the logged in users friends/followers/following and caches them client side (This only occurs the first time a user focusses on the search input)
- As the user types, a highly optimized function performs a regex against the array of usernames and builds an autocomplete list complete with avatars etc...
At the same time and for every keypress an ajax request is fired to the script below which does the following:
- Performs two separate Sphinx searches on two separate indexes. One to collect userid's and the other to collect document id's (rippleid's)
- The results of the users query are looped through checking against an array of userid's that were sent in the ajax request to avoid duplicating users that were already displayed during the initial high speed friends/followers check.
- Next we query the actual database to get the userdata for the remaining userid's
- The same process is then repeated but this time for the documents (ripples)
And finally any returned results are appended to the auto complete list.
This is an example of the PHP function that performs the sphinx lookups and gets the data from the database.
public function search()
{
$this->disableLayout();
$request = new Request();
$params = $request->getParams(GET);
//Perform sphinx textsearch
include('/usr/local/lib/php/sphinxapi.php');
$sphinx = new \SphinxClient();
$sphinx->setMatchMode(SPH_MATCH_ANY);
$sphinx->SetLimits(0, 4);
$mysqlconn = mysql_connect("127.0.0.1:9306") or die ("Couldn't connect to MySQL.");
$users = $sphinx->Query($params['data']['q'], "users");
$ripples = $sphinx->Query($params['data']['q'], "ripples");
/*
*USERS
*/
//Loop through users and only collect ID's that are not already present
if (!empty($users["matches"])) {
$ids = "";
foreach($users['matches'] as $id => $data) {
if($ids > ""){
$ids .= ",";
}
if(!isset($params['data']['e'][$id])){
$ids .= $id;
}
}
//If there any any remaining ID's collect the data from the database and return as JSON
if(!empty($ids)){
$userdataquery = "select users.userid, users.firstname, users.lastname
from tellycards_user_data users
where userid IN($ids)
";
$query = new Query($userdataquery);
$usersoutput = $query->fetchAll();
}
}
/*
*RIPPLES
*/
//Loop through ripples and collect ID's
if (!empty($ripples["matches"])) {
$rippleids = "";
foreach($ripples['matches'] as $id => $data) {
if($rippleids > ""){
$rippleids .= ",";
}
$rippleids .= $id;
}
//If there any any remaining ID's collect the data from the database and return as JSON
if(!empty($rippleids)){
$rippledataquery = "select ripples.id, ripples.name, ripples.screenshot
from tellycards_ripples ripples
where id IN($rippleids)
";
$query = new Query($rippledataquery);
$ripplesoutput = $query->fetchAll();
}
}
header('Content-type: text/json');
echo json_encode(array(
'users' => (!empty($usersoutput)) ? $usersoutput : null,
'ripples' => (!empty($ripplesoutput)) ? $ripplesoutput : null
));
}
You might ask why we are doing the initial friends lookup and not just using sphinx for everything. Well by implementing the method above. the user gets instant feedback when they are typing due to having the array of friends stored client side, while despite the fantastic speed of sphinx there inevitably will be some lag due to the http request. In practice it works fantastically and incidentally it appears to be the method that facebook uses also.
Also there is a lot of javascript code preventing unnecessary lookups, the returned data gets added to the cache pile etc so that future searches do not require hitting sphinx/db etc...
Now finally onto my actual question....
This current server side function bothers me a lot. Right now there are two searches being performed by Sphinx and two searches being performed by MySQL. How can I possibly collate all this into one sphinx query and one MySQL query? Is there any way at all? (Please bare in mind that documents and users may share the same PK ID's as they are on two completely different tables in MySQL and are spread (currently) across two separate indexes). Or is there any way to combine the two MySQL queries to make them more efficient than having two separate selects?
Or alternatively... Due to the simplicity of the queries am I best keeping them separate as above? (both are indexed primary key queries)
I guess what I am asking for is any recommendations/advice.
Any commentary is very welcome.