6
votes

I am trying to fetch all the categories and their count (no of products in that category) of those products where keyword matches. The query I tried doesn't give me the correct result. Also I want the parent categories till level 1 and their count as well.

e.g. I am trying with keyword watch, then category "watches" should be there with some count. Also the parent category "accessories" with the sum of its descendant categories count.

my table structures are:

tblProducts: There are 5 categories of a product, fldCategoryId1, fldCategoryId2, fldCategoryId3, fldCategoryId4 and fldCategoryId5. fldProductStatus should be 'A'

+-----------------------------+-------------------+
| Field                       | Type              |
+-----------------------------+-------------------+
| fldUniqueId                 | bigint(20)        |
| fldCategoryId1              | bigint(20)        |
| fldCategoryId2              | bigint(20)        |
| fldCategoryId3              | bigint(20)        |
| fldCategoryId4              | bigint(20)        |
| fldCategoryId5              | bigint(20)        |
| fldProductStatus            | enum('A','P','D') |
| fldForSearch                | longtext          |
+-----------------------------+-------------------+

tblCategory:

+------------------------------+-----------------------+
| Field                        | Type                  |
+------------------------------+-----------------------+
| fldCategoryId                | bigint(20)            |
| fldCategoryName              | varchar(128)          |
| fldCategoryParent            | int(11)               |
| fldCategoryLevel             | enum('0','1','2','3') |
| fldCategoryActive            | enum('Y','N')         |
+------------------------------+-----------------------+

Search Query:

SELECT count( c.fldCategoryId ) AS cnt, c.fldCategoryLevel, c.fldCategoryParent, c.fldCategoryId, c.fldCategoryName, p.fldForSearch, c.fldCategoryParent
FROM tblCategory c, tblProducts p
WHERE (
    c.fldCategoryId = p.fldCategoryId1
    OR c.fldCategoryId = p.fldCategoryId2
    OR c.fldCategoryId = p.fldCategoryId3
    OR c.fldCategoryId = p.fldCategoryId4
    OR c.fldCategoryId = p.fldCategoryId5
)
AND p.fldProductStatus = 'A'
AND (
    MATCH ( p.fldForSearch )
    AGAINST (
        '+(watches watch)'
        IN BOOLEAN MODE
    )
)
GROUP BY c.fldCategoryId

Note: The table is in the InnoDB engine and have FULLTEXT search index on 'fldForSearch' column.

EDIT: sample data can be found in sqlfiddle

3
Can you live with the rollup output? The sums would be listed below/above each group, not as part of each row (so you might have to do some preparations before displaying it). You might consider doing the summation in your app (e.g. use the treenode class in php). If you add the table structures and some example data as pasteable code (the create and insert statements), as well as the expected output, you make it easier to test the answer. Can you change tblCategory? It's not the best tree model for databases (but works here)Solarflare
After fighting this battle with millions of products and hundreds of categories in SQL server, I can tell you that the queries just get more complicated and no amount of hardware and fancy coding will make it fast. You may want to consider something like SOLR or ElasticSearchJen R
Thank Jen R, Elastic Search is the next stage for this. For now I want to solve this with scripting (PHP) and queries.ankitr
Is it possible, that a product is directly assigned to a categorie X and to an ancestor of X? E.g.: fldCategoryId1 is X and fldCategoryId2 is parent of X.Paul Spiegel
Paul - yes, its possibleankitr

3 Answers

2
votes

I'm not sure what you mean by:

Also I want the parent categories till level 1 and their count as well.

But the following query will show you a count for each category (including those with 0 found products), and a general rollup:

SELECT 
    c.fldCategoryId, 
    c.fldCategoryLevel, 
    c.fldCategoryName, 
    COUNT( * ) AS cnt
FROM tblCategory c
    LEFT JOIN tblProducts p ON
            (c.fldCategoryId = p.fldCategoryId1
        OR  c.fldCategoryId = p.fldCategoryId2
        OR  c.fldCategoryId = p.fldCategoryId3
        OR  c.fldCategoryId = p.fldCategoryId4
        OR  c.fldCategoryId = p.fldCategoryId5)
        AND p.fldProductStatus = 'A'
        AND MATCH ( p.fldForSearch )
            AGAINST (
                '+(watches watch)'
                IN BOOLEAN MODE
            )
GROUP BY 
    c.fldCategoryId
    c.fldCategoryLevel,  
    c.fldCategoryName
WITH ROLLUP;

Notes:

  • you cannot select p.fldForSearch if you expect a count of all the products in the category. fldForSearch is on a per product basis, it defeats the grouping purpose
  • I left joined with products so it returns the categories with 0 products matching your keywords. If you don't want this to happen just remove the LEFT keyword
  • I haven't checked the MATCH condition I assume it's correct.
1
votes

Start by not splaying an array (fldCategoryId...) across columns. Instead, add a new table.

Once you have done that, the queries change, such as getting rid of OR clauses.

Hopefully, any further issues will fall into place.

1
votes

Since your category tree has a fixed height (4 levels), you can create a transitive closure table on the fly with

SELECT c1.fldCategoryId AS descendantId, c.fldCategoryId AS ancestorId
FROM tblcategory c1
LEFT JOIN tblcategory c2 ON c2.fldCategoryId = c1.fldCategoryParent
LEFT JOIN tblcategory c3 ON c3.fldCategoryId = c2.fldCategoryParent
JOIN tblcategory c ON c.fldCategoryId IN (
    c1.fldCategoryId,
    c1.fldCategoryParent,
    c2.fldCategoryParent,
    c3.fldCategoryParent
)

The result will look like

| descendantId | ancestorId |
|--------------|------------|
|            1 |          1 |
|            2 |          1 |
|            2 |          2 |
|          ... |        ... |
|            5 |          1 |
|            5 |          2 |
|            5 |          5 |
|          ... |        ... |

You can now use it in a subquery (derived table) to join it with products using descendantId and with categories using ancestorId. That means that a product from category X will be indirectly associated with all ancestors of X (as well as with X). For example: Category 5 is a child of 2 - and 2 is a child of 1. So all products from category 5 must be counted for categories 5, 2 and 1.

Final query:

SELECT c.*, coalesce(sub.cnt, 0) as cnt
FROM tblCategory c
LEFT JOIN (
    SELECT tc.ancestorId, COUNT(DISTINCT p.fldUniqueId) AS cnt
    FROM tblProducts p
    JOIN (
        SELECT c1.fldCategoryId AS descendantId, c.fldCategoryId AS ancestorId
        FROM tblcategory c1
        LEFT JOIN tblcategory c2 ON c2.fldCategoryId = c1.fldCategoryParent
        LEFT JOIN tblcategory c3 ON c3.fldCategoryId = c2.fldCategoryParent
        JOIN tblcategory c ON c.fldCategoryId IN (
            c1.fldCategoryId,
            c1.fldCategoryParent,
            c2.fldCategoryParent,
            c3.fldCategoryParent
        )
    ) tc ON tc.descendantId IN (
        p.fldCategoryId1,
        p.fldCategoryId2,
        p.fldCategoryId3,
        p.fldCategoryId4,
        p.fldCategoryId5
    )
    WHERE p.fldProductStatus = 'A'
        AND MATCH ( p.fldForSearch )
            AGAINST ( '+(watches watch)' IN BOOLEAN MODE )
    GROUP BY tc.ancestorId
) sub ON c.fldCategoryId = sub.ancestorId

Result for your sample data (without level, since it seems to be wrong anyway):

| fldCategoryId | fldCategoryName | fldCategoryParent | fldCategoryActive | cnt |
|---------------|-----------------|-------------------|-------------------|-----|
|             1 |             Men |                 0 |                 Y |   5 |
|             2 |     Accessories |                 1 |                 Y |   5 |
|             3 |       Men Watch |                 1 |                 Y |   3 |
|             5 |           Watch |                 2 |                 Y |   5 |
|             6 |           Clock |                 2 |                 Y |   3 |
|             7 |     Wrist watch |                 1 |                 Y |   2 |
|             8 |           Watch |                 2 |                 Y |   4 |
|             9 |          watch2 |                 3 |                 Y |   2 |
|            10 |        fastrack |                 8 |                 Y |   3 |
|            11 |           swish |                 8 |                 Y |   2 |
|            12 |         digital |                 5 |                 Y |   2 |
|            13 |          analog |                 5 |                 Y |   2 |
|            14 |            dual |                 5 |                 Y |   1 |

Demos:

Note that the outer (left joined) subquery is logically not necessary. But from my experience MySQL doesn't perform well without it.

There are still ways for performance optimisation. One is to store the transitive closure table in an indexed temporary table. You can also persist it in a regular table, if categories do rarely change. You can also manage it with triggers.