1
votes

Before asking this question it is important to understand what it is I am actually doing.

The best comparison to the feature I am implementing would be Facebook's search feature. When you begin typing a drop down list appears with various search results. At the top you will find your friends whose names match your search, then other people who match, then pages, events etc....

My situation is similar however I only want to search for two things. users and documents (named ripples in the code below).

I have this working fine. Please bear with me while I talk through the logic of this feature in my case:

  1. User focuses on search input.
  2. Ajax request retrieves the logged in users friends/followers/following and caches them client side (This only occurs the first time a user focusses on the search input)
  3. As the user types, a highly optimized function performs a regex against the array of usernames and builds an autocomplete list complete with avatars etc...
  4. At the same time and for every keypress an ajax request is fired to the script below which does the following:

    • Performs two separate Sphinx searches on two separate indexes. One to collect userid's and the other to collect document id's (rippleid's)
    • The results of the users query are looped through checking against an array of userid's that were sent in the ajax request to avoid duplicating users that were already displayed during the initial high speed friends/followers check.
    • Next we query the actual database to get the userdata for the remaining userid's
    • The same process is then repeated but this time for the documents (ripples)

And finally any returned results are appended to the auto complete list.

This is an example of the PHP function that performs the sphinx lookups and gets the data from the database.

         public function search()
                {
                                $this->disableLayout();
                                $request = new Request();
                                $params = $request->getParams(GET);

                        //Perform sphinx textsearch
                            include('/usr/local/lib/php/sphinxapi.php');
                            $sphinx = new \SphinxClient();
                            $sphinx->setMatchMode(SPH_MATCH_ANY);
                            $sphinx->SetLimits(0, 4);
                            $mysqlconn = mysql_connect("127.0.0.1:9306") or die ("Couldn't connect to MySQL.");
                            $users = $sphinx->Query($params['data']['q'], "users");
                            $ripples = $sphinx->Query($params['data']['q'], "ripples");


        /*
         *USERS
         */

            //Loop through users and only collect ID's that are not already present    
            if (!empty($users["matches"])) { 
                $ids = "";
                foreach($users['matches'] as $id => $data) {
                    if($ids > ""){
                        $ids .= ",";
                    }
                    if(!isset($params['data']['e'][$id])){
                        $ids .= $id;
                    }
                }


          //If there any any remaining ID's collect the data from the database and return as JSON
                        if(!empty($ids)){
                                $userdataquery = "select users.userid, users.firstname, users.lastname
                                                    from tellycards_user_data users   
                                                    where userid IN($ids)
                                                ";
                                $query = new Query($userdataquery);
                                $usersoutput = $query->fetchAll();                              
                        }
        }

        /*
         *RIPPLES
         */

        //Loop through ripples and collect ID's 
        if (!empty($ripples["matches"])) { 
            $rippleids = "";
            foreach($ripples['matches'] as $id => $data) {
                if($rippleids > ""){
                    $rippleids .= ",";
                }                       
                    $rippleids .= $id;
            }

        //If there any any remaining ID's collect the data from the database and return as JSON
                        if(!empty($rippleids)){
                                $rippledataquery = "select ripples.id, ripples.name, ripples.screenshot
                                                    from tellycards_ripples ripples   
                                                    where id IN($rippleids)
                                                ";
                                $query = new Query($rippledataquery);
                                $ripplesoutput = $query->fetchAll();                              
                        }
        }

        header('Content-type: text/json');
        echo json_encode(array(
                               'users'      => (!empty($usersoutput)) ? $usersoutput : null,
                               'ripples'    => (!empty($ripplesoutput)) ? $ripplesoutput : null
                        ));

}

You might ask why we are doing the initial friends lookup and not just using sphinx for everything. Well by implementing the method above. the user gets instant feedback when they are typing due to having the array of friends stored client side, while despite the fantastic speed of sphinx there inevitably will be some lag due to the http request. In practice it works fantastically and incidentally it appears to be the method that facebook uses also.

Also there is a lot of javascript code preventing unnecessary lookups, the returned data gets added to the cache pile etc so that future searches do not require hitting sphinx/db etc...

Now finally onto my actual question....

This current server side function bothers me a lot. Right now there are two searches being performed by Sphinx and two searches being performed by MySQL. How can I possibly collate all this into one sphinx query and one MySQL query? Is there any way at all? (Please bare in mind that documents and users may share the same PK ID's as they are on two completely different tables in MySQL and are spread (currently) across two separate indexes). Or is there any way to combine the two MySQL queries to make them more efficient than having two separate selects?

Or alternatively... Due to the simplicity of the queries am I best keeping them separate as above? (both are indexed primary key queries)

I guess what I am asking for is any recommendations/advice.

Any commentary is very welcome.

2

2 Answers

1
votes

You cant really get away with not having two MySQL queries. Well you could, by either jsut combining them into one, with UNION. Or by creating a new combined 'table' (either a view, or a materialized view) - but really dont think its worth the effort. Two queries is perfectly fine - as you say they indexed.

You could use one sphinx index (and hence one search query) - by creating a new combined index. Because you say your keys are not unique, would have to create a new synthetic key.

eg...

sql_query = SELECT userid*2 AS id, 1 AS table_id, firstname AS one, lastname as two FROM tellycards_user_data \
              UNION \
            SELECT (id*2)+1 as id, 2 AS table_id, name AS one, screenshot AS two FROM tellycards_ripples
sql_attr_unit = table_id

This gives you a fake key , and an attribute to identify what table the result came from. You can use this to get the original table it came from. (there are many other ways of doing the same thing)

This allows you to run one query, can get combined results.

... BUT not convinced its a good idea. Because if the results are asymmetric, you may miss results. Say there are 20 matching results from one table, and 10 from another. Say you show the top 10 results, now becayse of the limit, the results from the second table, could well be hidden below the first table (extream example, in reality, hopefully they intermingled). Two seperate queries, allows you to guarantee, to get SOME results from each table.

... so after all that. Stick with what you got. Its fine.

0
votes

You can store and retrieve all data about users and documents in Sphinx so no MySQL is needed.

Use Sphinx QL not API (much better and easier to get stuff done -> http://sphinxsearch.com/docs/current.html#sphinxql-reference )

Notice: dont forget to set all textual fields that you want to retrieve data back from as sql_field_string in sphinx.conf source