0
votes

I have an existing populated MySQL table with three columns: product_id, category_id and catalog_id

Each catalog has multiple categories in it and each category has multiple products in it. Categories belonging to the same catalog should have exactly the same products in them, but unfortunately they don't in certain cases.

I need to identify the missing products in each category. Missing means that a product exists in at least one other category that belongs to the same catalog but doesn't exist in that particular category.

So the result I need to get out of this is a list of product_id/category_id pairs that are missing and need to be added.

How do I accomplish this in MySQL?

I tried creating a table populated by the distinct product_id and catalog_id pairs to get all the products for each catalog and then join that with the main table, but I am not sure what type of join to perform.

Any MySQL experts willing to help?

Update:

Based on a request, here is the create table SQL (this is a simplified version of the actual scenario):

create table product (
  product_id bigint not null,
  category_id bigint not null,
  catalog_id bigint not null
);

Update 2:

Clarification: Every category that belongs to the same catalog must have the same exact products in it as all the other categories that belong to the same catalog. If a product is in one category and not in another category that belongs to the same catalog, then it is missing and needs to be identified as a product_id/category_id pair.

Update 3:

Per another request, here is sample data:

insert into product (product_id, category_id, catalog_id) values (1, 1, 1);
insert into product (product_id, category_id, catalog_id) values (2, 1, 1);
insert into product (product_id, category_id, catalog_id) values (3, 1, 1);
insert into product (product_id, category_id, catalog_id) values (1, 2, 1);
insert into product (product_id, category_id, catalog_id) values (3, 2, 1);

In this case the pair of product_id 2 and category_id 2 would be identified as part of the result. This is because categories 1 and 2 belong to the same catalog (1) and category 2 has a missing product, namely product_id 2.

2
I think I missed your sqlFiddle?GuyT
@TimBiegeleisen the schema is only these three columns all of type bigint.stepanian
With SQL question on SO, a picture truly is worth a thousand words. If you had some sample data showing the problem it would be much easier for us to digest.Tim Biegeleisen
@GuyT: Not sure how sqlFiddle would have helped in this specific case. The requirement is VERY simple to understand, but much more difficult to solve as apparent by the accepted answer.stepanian
@GuyT: No worries :) I am grumpy because it's my almost my bed time (midnight in California). And it looks like you got your sqlFiddle too in the demo provided by the answer from Georgos. It must be your lucky day!stepanian

2 Answers

2
votes

You can do it using the following query:

SELECT s1.product_id, s1.category_id
FROM (
  SELECT t1.product_id, t2.category_id, t1.catalog_id
  FROM (
    SELECT DISTINCT product_id, catalog_id
    FROM product) AS t1
  CROSS JOIN (
    SELECT DISTINCT category_id, catalog_id
    FROM product) AS t2
  WHERE t1.catalog_id = t2.catalog_id ) AS s1
LEFT JOIN product AS s2 
ON s1.catalog_id = s2.catalog_id AND 
   s1.category_id = s2.category_id AND
   s1.product_id = s2.product_id
WHERE s2.product_id IS NULL

Demo here

Explanation:

This query:

SELECT DISTINCT product_id, catalog_id
FROM product

gives you a list of all distinct products per catalog:

product_id  catalog_id
-----------------------
1           1
2           1
3           1

If you perform a CROSS JOIN of the above to all distinct categories per catalog:

SELECT t1.product_id, t2.category_id, t2.catalog_id
FROM (
  SELECT DISTINCT product_id, catalog_id
  FROM product) AS t1
CROSS JOIN (
  SELECT DISTINCT category_id, catalog_id
  FROM product) AS t2
WHERE t1.catalog_id = t2.catalog_id  

you get:

product_id  category_id catalog_id
----------------------------------
1           1           1 
1           2           1
2           1           1
2           2           1
3           1           1
3           2           1

The above is a comprehensive set containing the full list of product_id per category_id per catalog_id.

All you have to do now is to find the missing product_id, category_id pairs from your table. You can do that with a use of a LEFT JOIN as in the initial query.

0
votes

You can also do by this in optimize way-

Hitesh> select * from product;                                                                                                        +------------+-------------+------------+
| product_id | category_id | catalog_id |
+------------+-------------+------------+
|          1 |           1 |          1 |
|          2 |           1 |          1 |
|          3 |           1 |          1 |
|          1 |           2 |          1 |
|          3 |           2 |          1 |
|          4 |           2 |          1 |
|          5 |           2 |          1 |
|          1 |           2 |          2 |
|          2 |           1 |          2 |
+------------+-------------+------------+
9 rows in set (0.00 sec)

Hitesh>
    SELECT product_id, category_id, catalog_id
    FROM
      (SELECT DISTINCT p1.product_id, p2.category_id, p1.catalog_id
       FROM product p1 JOIN product p2 ON p1.catalog_id=p2.catalog_id) tmp
    WHERE NOT EXISTS (SELECT 1 FROM product 
                      WHERE category_id = tmp.category_id AND   
                            product_id=tmp.product_id AND   
                            catalog_id=tmp.catalog_id);
+------------+-------------+------------+
| product_id | category_id | catalog_id |
+------------+-------------+------------+
|          4 |           1 |          1 |
|          5 |           1 |          1 |
|          2 |           2 |          1 |
|          2 |           2 |          2 |
|          1 |           1 |          2 |
+------------+-------------+------------+
5 rows in set (0.00 sec)