In a MySQL 5.6 database I have table tablename
which has (including others) three TEXT
columns: col_a, col_b, col_c
.
I want to extract all unique words (with words being separated by spaces) from these three columns that are at least 5 characters long. By "word" I mean any string of non-space characters, eg "foo-123" would be a word, as would "099423". The columns are all utf8 format InnoDB columns.
Is there a single query to do this?
EDIT: As requested, here's an example: (in the real data col_a, col_b and col_c are TEXT fields and could have a large number of words.)
select id, col_a, col_b, col_c from tablename;
id | col_a | col_b | col_c
----|--------------------|----------------|----------------------
1 | apple orange plum | red green blue | bill dave sue
2 | orange plum banana | yellow red | frank james
3 | kiwi fruit apple | green pink | bill sarah-jane frank
expected_result: ["apple", "orange", "banana", "fruit",
"green", "yellow", "frank", "james", "sarah-jane"]
I don't care about the order of results. thanks!
EDIT: in my example above, everything is in lowercase, as that's how I happen to store everything in my real-life table that this question relates to. But, for the sake of argument, if it did contain some capitalisation I would prefer the query to ignore capitalisation (this is the setting of my DB config as it happens).
EDIT2: in case it helps, all of the text columns have a FULLTEXT index on them.
EDIT3: here is the SQL to create the sample data:
DROP TABLE IF EXISTS `tablename`;
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`col_a` text,
`col_b` text,
`col_c` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
LOCK TABLES `tablename` WRITE;
INSERT INTO `tablename` VALUES (1,'apple orange plum','red green blue','bill dave sue'),(2,'orange plum banana','yellow red','frank james'),(3,'kiwi fruit apple','green pink','bill sarah-jane frank');
UNLOCK TABLES;