2
votes

I'm currently trying to implement a search engine function in my CakePHP site, trying to return information from 3 tables efficiently. The main usage will be numeric searches, free text will be extremely minimal and as such I'm not trying to optimise for this scenario.

The problem I'm having is trying to Group results from one table to reduce duplicate information, sorry for the long post!

The tables being used are as follows:

Companies hasMany Products
Products hasMany Prices

I have a successful method of returning results from all 3 tables based on conditions for any or all of the tables using the below code (as a result of the question here)

    //configure search conditions
    $options['conditions'] = array(
        'Company.name LIKE' => '%'.$search_term.'%',
        'Product.feature' => $product_feature,
        'Price.price <' => $price
    );

    //configure search fields
    $options['fields'] = array(
        'Company.id',
        'Company.name',
        'Product.id',
        'Product.feature',
        'Price.id',
        'Price.price',
    );

    //configure search joins
    $options['joins'] = array(
        'INNER JOIN prices as Price ON Price.product_id = Product.id INNER JOIN companies as Company ON Product.company_id = Company.id'
    );

    //configure recursion
    $options['recursive'] = -1;

    //configure pagination options
    $this->Paginator->settings = $options;

    //retrieve results and pass to view
    $this->set('results', $this->Paginator->paginate('Product'));

The results returned from the above query are like follows:

Array
(
[0] => Array
    (
        [Company] => Array
            (
                [id] => 1
                [name] => Company 1
            )

        [Product] => Array
            (
                [id] => 1
                [feature] => true
            )

        [Price] => Array
            (
                [id] => 1
                [price] => 1.00
            )

    )

[1] => Array
    (
        [Company] => Array
            (
                [id] => 1
                [name] => Company 1
            )

        [Product] => Array
            (
                [id] => 1
                [feature] => true
            )

        [Price] => Array
            (
                [id] => 2
                [price] => 2.00
            )

    )
)

As you can see the company and product information in the above instance is duplicated, ideally I'd like the information to be returned as follows:

Array
(
[0] => Array
    (
        [Company] => Array
            (
                [id] => 1
                [name] => Company 1
            )

        [Product] => Array
            (
                [id] => 1
                [feature] => true
            )

        [Price] => Array
            (
                [0] => Array
                    (
                        [id] => 1
                        [price] => 1.00
                    )
                [1] => Array
                    (       
                        [id] => 2
                        [price] => 2.00
                    )
            )
    )
)

I managed to create this by using the following settings:

    //configure search joins
    $options['joins'] = array(
        'INNER JOIN prices as Price ON Price.product_id = Product.id'
    );

    //configure recursion
    $options['recursive'] = 1;

The above continues to work returning only results that meet all conditions for Companies and Products, but in the Prices array it returns all prices for the specified Companies and Products rather than just those that meet the conditions.

For example: a condition of "max price of 1" with the above information would return all companies and products that have a price that meets the condition "max price of 1", the problem being it would just then list all prices even those that don't meet the condition, as follows:

Array
(
[0] => Array
    (
        [Company] => Array
            (
                [id] => 1
                [name] => Company 1
            )

        [Product] => Array
            (
                [id] => 1
                [feature] => true
            )

        [Price] => Array
            (
                [0] => Array
                    (
                        [id] => 1
                        [price] => 1.00
                    )
                //the below array result shouldn't be here as it doesn't meet the condition "max price of 1"
                [1] => Array
                    (       
                        [id] => 2
                        [price] => 2.00
                    )
            )
    )
)

Question: How can I modify the above code to return information with grouped results from the Price table to reduce duplicates, but only those that actually meet the specified conditions?

Bonus: If there is a more efficient way of performing the above search I'd be very interested to know. Even though the above takes 0ms with the limited number of results I have on my local machine CakePHP is still telling me "maybe slow" which I believe is as a result of the joins.

1
I don't know cakephp, but is this: 'Price.price <' => $price with the < correct? Obviously there is scope for < (instead of =) to return multiple prices.Turophile
Yes in cakephp this is the method used for including conditions, so all prices less than the $price variable.Alex

1 Answers

3
votes

Split the problem in two

What you describe is:

  • Find all products with at least one product matching the conditions
  • For those products, return the products with matching price data.

The associations you describe are:

Company hasMany Product 
Product hasMany Price

OR:

Product belongsTo Company
Price belongsTo Product

Expressed like that it may be obvious that a find on Product will join Company if recursive is 0 or greater. That removes one manual join.

Ensure the right products are returned

First ensure you obtain the list of products you want. from what is described there's a choice either set things up using joins:

$options['recursive'] = 0; // temporary

$options['conditions'] = array(
    'Company.name LIKE' => '%'.$search_term.'%',
    'Product.feature' => $product_feature,
    'Price.price <' => $price
);

//configure search fields
$options['fields'] = array(
    'Distinct Product.id',
    'Product.feature',
    'Company.id',
    'Company.name',
    #'Price.id', No
    #'Price.price', No
);

$options['joins'][] = 'INNER JOIN prices as PriceFilter ON Price.product_id = Product.id';

Or Conditions:

$options['recursive'] = 0; // temporary

$options['conditions'] = array(
    'Company.name LIKE' => '%'.$search_term.'%',
    'Product.feature' => $product_feature,
    "WHERE EXISTS (select * from prices where prices.product_id = Product.id AND prices.price < $price)"
);

$options['fields'] = array(
    'Product.id',
    'Product.feature',
    'Company.id',
    'Company.name',
    #'Price.id', No
    #'Price.price', No
);

Note that there are now no extra joins in the primary find/paginate call.

In both of these examples there should be one query (plus a count) executed, without any price data.

Use containable to get matching prices

Containable makes it easier to manage what queries are executed and what scope of results are returned. In this case all that's required is to add Price data to the resultset - and filter the prices. A complete example demonstrating the use of the contain option:

public $paginate = array(
    'contain' => array(
        'Company',
        'Price' => array()
    ),
    'fields' => array(
        'Product.id',
        'Product.feature',
        'Company.id',
        'Company.name'
    )
);

function whatever() {
    ...

    $this->paginate['contain']['Price']['conditions']['Price.price <'] = $price;

    $conditions = array(
        'Company.name LIKE' => '%'.$search_term.'%',
        'Product.feature' => $product_feature,
        "WHERE EXISTS (select * from prices where prices.product_id = Product.id AND prices.price < $price)"
    );

    $result = $this->paginate('Product', $conditions);

    ...
}

This should result in two queries (plus a count), and the data structure you're looking for; including price data.

Maybe slow

Even though the above takes 0ms, [...] CakePHP is still telling me "maybe slow"

Debug kit is not interogating the database, to determine if a query is "maybe slow" it is a simple test for:

  • Query took more than 0ms
  • Query took more than 1ms per result
  • Query took more than threshold (which defaults to 20ms)

From checking the code it should never flag a 0ms query as "maybe slow" - but it's not a problem if it does.

As with all database activity it's a good idea to run explain on the db, add any missing indexes and consider different query structures that would return the same data.