I have a TABLE
called Food with a few thousand records. Each record is a food consisting of a Description and 5 categories. Each succeeding category is more specific than the last. Description, cat1, and cat2 cannot accept NULL
values. Cat3, cat4, and cat5 can accept NULL
values.
For example:
description | cat1 | cat2 | cat3 | cat4 | cat5
Green apple | Food | produce | NULL | apple | NULL
Root beer | Bev | Non-Alc | NULL | Root beer | NULL
The issue I'm having is that there are lots of duplicate descriptions assigned different categories. And I need help writing a code that can delete these duplicates by certain conditions.
Example:
description | cat1 | cat2 | cat3 | cat4 | cat5
Mango Syrup | Food | Produce | NULL | Mango | NULL
Mango Syrup | Food | Dry Good | NULL | NULL | Syrup
Pepperoni Pizza| Food | Meat | Pepperoni| NULL | NULL
Pepperoni Pizza| Food | Bakery | NULL | Pizza | NULL
I have hundreds of these, and luckily, they're all set up the same way. Out of the three columns cat3, 4, and 5, only one can be populated. If a record's cat5 IS NOT NULL
, cat3 and cat4 will be NULL
. If a record's cat4 IS NOT NULL
, cat3 and cat5 will be NULL
. If a record's cat3 IS NOT NULL
, cat4 and cat5 will be NULL
.
The correct duplicate is the item whose record has the most specific category populated. All other duplicates should be deleted.
Things I've tried:
GROUP BY
and using MIN()
or MAX()
to aggregate different categories. This doesn't give me the result I want.
Anything will help. Appreciate it.
cat3
has been populated both times, and bothcat4
andcat5
have the valueNULL
? For the duplicate data, what are you wanting as the end resultset? Are you looking to update the table, or simply have a "clean" result set returned instead – Larnu