1
votes

I have an EAV model for my Products that stores their attributes in a MySQL database.

My question is, how can I retrieve specific counts for the attributes in an efficient manner (including when the products are filtered by said attributes)?

I've put the sample schema at the bottom or in the sql fiddles

UPDATED QUESTION (previous form was messy)

What I need specifically is when filtering over this result set and say , filter Google products with 4GB of RAM is to get this:

+--------------+-------------+-------+------------------------------------------+
| attributeId  |    value    | count |                                                                                                 |
+--------------+-------------+-------+------------------------------------------+
| 1            | Google      |     3 | Google with 4GB RAM                                                                             |
| 1            | Apple       |     2 | Apple with 4GB RAM (for active filters, show all options that match only the other filters)     |
| 1            | Microsoft   |     1 | Microsoft with 4GB RAM (for active filters, show all options that match only the other filters) |
| 1            | Sony        |     1 | Sony with 4GB RAM (for active filters, show all options that match only the other filters)      |
| 2            | Phone       |     1 | 1 Phone with 4GB RAM (matching both active filters)                                             |
| 3            | 4GB         |     3 | Even though it's an active filter, all Google products have 4GB RAM                             |
| 4            | Mali 200    |     1 |                                                                                                 |
| 4            | GMA 650     |     1 |                                                                                                 |
| 5            | Black       |     2 |                                                                                                 |
| 5            | Gray        |     1 |                                                                                                 |
| availability | unavailable |     3 |                                                                                                 |
| price        | > 250       |     2 |                                                                                                 |
| price        | 0-50        |     1 |                                                                                                 |
+--------------+-------------+-------+------------------------------------------+

To sum it up, if the attribute is filtered by , it needs to show all values with counts for that attribute that match all the other (N-1) filters .

If it's not filtered by, it needs to show the counts that match all N filters.

Also there's that couple of special attributes (availability and price) that rest in the Product table which need to be handled. At the moment doing a union for those.

Example:

Attributes: A, B, C, D ,E, F
Filters: A, C, D
Result:
A (Active Filter) - Values with counts that match filters C, D
B - Values with counts that match filters A, C, D
C (Active Filter) - Values with counts that match filters A, D
D (Active Filter) - Values with counts that match filters C, D
E - Values with counts that match filters A, C, D
F - Values with counts that match filters A, C, D

The current query I am using is this:

UPDATE Slightly altered the previous query and it gives correct results now, however it is still pretty slow and I'm not sure I want to go to production with this;

(
    SELECT 
        'availability' AS attributeId, 
        availability as value,
      count(1) as 'count'
   FROM (
        SELECT p.id, p.availability FROM Products p
        JOIN ProductAttributes pa ON
            pa.productId = p.id
        WHERE
            (pa.attributeId = 1 AND pa.value = 'A1-Value A')
        GROUP BY p.id, p.availability
        HAVING (
            COUNT(*) = 1
        )
    ) AS pf
    GROUP BY availability
)
UNION
(
    SELECT 
        'price' AS attributeId, 
        case 
        when (price) between 0 and 50 then '0-50'
        when (price) between 50 and 100 then '51-100'
        when (price) between 100 and 150 then '101-150'
        when (price) between 150 and 200 then '151-200'
        when (price) between 200 and 250 then '201-250'
      ELSE '> 250'
      end as value,
      count(1) as 'count'
   FROM (
        SELECT p.id, p.price FROM Products p
        JOIN ProductAttributes pa ON
            pa.productId = p.id
        WHERE
            (pa.attributeId = 1 AND pa.value = 'A1-Value A')
        GROUP BY p.id, p.price
        HAVING (
            COUNT(*) = 1
        )
    ) AS pf
    GROUP BY value
)   
UNION
(
    SELECT 
        pfa.attributeId, pfa.value, COUNT(*)
    FROM (
            SELECT p.id FROM Products p
            JOIN ProductAttributes pa ON
                pa.productId = p.id
            GROUP BY p.id
        ) AS pf
         JOIN ProductAttributes pfa ON
            pfa.productId = pf.id
         WHERE (
            NOT (pfa.attributeId = 1 AND pfa.value = 'A1-Value A')
         )
    GROUP BY pfa.attributeId, pfa.value
)
UNION
(
    SELECT 
        pfa.attributeId, pfa.value, COUNT(*)
    FROM (
            SELECT p.id FROM Products p
            JOIN ProductAttributes pa ON
                pa.productId = p.id
            WHERE
                (pa.attributeId = 1 AND pa.value = 'A1-Value A')
            GROUP BY p.id
            HAVING (
                COUNT(*) = 1
            )
        ) AS pf
         JOIN ProductAttributes pfa ON
            pfa.productId = pf.id
         WHERE
         (
            (pfa.attributeId = 1 AND pfa.value = 'A1-Value A')
         )

    GROUP BY pfa.attributeId, pfa.value
)

SQL Fiddle

However, it doesn't quite fully return the correct results and it's a bit bulky, am hoping someone knows how to do this properly.

Thanks a lot!

SAMPLE SCHEMA

CREATE TABLE `Products` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `categoryId` INT(11) NULL DEFAULT NULL,
    `name` VARCHAR(255) NOT NULL,
    `availability` VARCHAR(255) NOT NULL,
    `price` FLOAT NOT NULL,
    `discount` FLOAT NULL DEFAULT NULL,
    `active` INT(11) NOT NULL,
    PRIMARY KEY (`id`),
    INDEX `price` (`price`),
    INDEX `categoryId` (`categoryId`),
    INDEX `discount` (`discount`),
    FULLTEXT INDEX `name` (`name`)
);

INSERT INTO `Products` (`id`, `categoryId`, `name`, `availability`, `price`, `discount`, `active`)
VALUES ('1', '1', 'Stadia', 'unavailable', '50', '4.06', true),
('2', '1', 'Pixel 3', 'available', '500', '4.06', true),
('3', '1', 'Chromebook', 'unavailable', '300', '4.06', true),
('4', '1', 'Mac Pro', 'unavailable', '1200', '4.06', true),
('5', '1', 'iPhone', 'unavailable', '1000', '4.06', true),
('6', '1', 'iPad', 'available', '400', '4.06', true),
('7', '1', 'Playstation 4', 'unavailable', '400', '4.06', true),
('8', '1', 'Xperia Phone', 'unavailable', '300', '4.06', true),
('9', '1', 'Surface Pro', 'available', '800', '4.06', true),
('10', '1', 'Windows Phone 7 Phone', 'unavailable', '300', '4.06', true),
('11', '1', 'Windows 10 PC', 'unavailable', '2000', '4.06', true),
('12', '1', 'XBOX', 'available', '500', '4.06', true);



CREATE TABLE `Attributes` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `textId` VARCHAR(255) NOT NULL,
    `name` VARCHAR(255) NOT NULL,
    PRIMARY KEY (`id`)
);

INSERT INTO `Attributes` (`id`, `textId`, `name`)
VALUES ('1', 'Brand', 'brand'),
('2', 'Type','type'),
('3', 'RAM', 'ram'),
('4', 'GPU', 'gpu'),
('5', 'Color', 'color');

CREATE TABLE `ProductAttributes` (
    `attributeId` INT(11) NOT NULL,
    `productId` INT(11) NOT NULL,
    `value` VARCHAR(255) NULL DEFAULT NULL,
    PRIMARY KEY (`attributeId`, `productId`),
    INDEX `ValueIndex` (`value`),
    INDEX `AttrIndex` (`attributeId`)
);

INSERT INTO `ProductAttributes` (`attributeId`, `productId`, `value`)
VALUES 
('1', '1','Google'),
('1', '2', 'Google'),
('1', '3', 'Google'),
('1', '4', 'Apple'),
('1', '5', 'Apple'),
('1', '6', 'Apple'),
('1', '7', 'Sony'),
('1', '8', 'Sony'),
('1', '9', 'Microsoft'),
('1', '10', 'Microsoft'),
('1', '11', 'Microsoft'),
('1', '12', 'Microsoft'),
('2', '1','Game Console'),
('2', '2', 'Phone'),
('2', '3', 'Computer'),
('2', '4', 'Computer'),
('2', '5', 'Phone'),
('2', '6', 'Tablet'),
('2', '7', 'Game Console'),
('2', '8', 'Phone'),
('2', '9', 'Tablet'),
('2', '10', 'Phone'),
('2', '11', 'Computer'),
('2', '12', 'Game Console'),
('3', '1', '4GB'),
('3', '2', '4GB'),
('3', '3', '4GB'),
('3', '4', '6GB'),
('3', '5', '4GB'),
('3', '6', '4GB'),
('3', '7', '8GB'),
('3', '8', '4GB'),
('3', '9', '6GB'),
('3', '10', '4GB'),
('3', '11', '8GB'),
('3', '12', '8GB'),
('4', '2', 'Mali 200'),
('4', '3', 'GMA 650'),
('4', '4', 'GMA 650'),
('4', '5', 'Mali 400'),
('4', '6', 'Mali 200'),
('4', '7', 'Radeon'),
('4', '8', 'Mali 400'),
('4', '9', 'GMA 650'),
('4', '10', 'Mali 400'),
('4', '11', 'Nvidia'),
('4', '12', 'Radeon'),
('5', '1','Black'),
('5', '2', 'Black'),
('5', '3', 'Gray'),
('5', '4', 'Gray'),
('5', '5', 'Black'),
('5', '6', 'Black'),
('5', '7', 'Black'),
('5', '8', 'Blue'),
('5', '9', 'Gray'),
('5', '10', 'Blue'),
('5', '11', 'Black'),
('5', '12', 'Green');
1
Such a long question and I still don't understand what this means: "But I also need to extract attribute counts to show on the filter bar. " Sample data and desired results would really help. - Gordon Linoff
Tried clearing it up a bit, hope it makes more sense. Also, there's sample data in the fiddles and the results I hope to get are what the "Giant Union Query" returns, but I am looking to get that in a more efficient manner - overburn

1 Answers

0
votes

If I understand correctly, you want all attribute value pairs for products that meet the condition of having the two specified attributes.

If so, I would use exists to identify the products and then just aggregate:

select pa.attributeId, pa.value, count(*)
from ProductAttributes pa
where exists (select 1
              from ProductAttributes pa2
              where pa2.productId = pa.productId and
                    pa2.attributeId = 1 and
                    pa2.value = 'A1-Value A'
             ) and
      exists (select 1
              from ProductAttributes pa2
              where pa2.productId = pa.productId and
                    pa2.attributeId = 3 and
                    pa2.value = 'A3-Value A'
             ) 
group by pa.attributeId, pa.value;

Here is a db<>fiddle.