I am trying to execute a query to select all rows that fall within ranges defined by start and end columns in another table. For example, with pseudo-code, if I had these (very small) tables:
ranges:
group_id = c("a", "b", "c", "d"),
start = c(1, 7, 2, 25),
end = c(5, 23, 7, 29)
positions:
position = 100 random numbers
annotation = 100 random strings
I would like to make a query that would return something like:
group_id position annotation
a 2 adfkjdas
a 3 sdlfkjasl;kdfj
b 9 sdlfkdj
c 5 wwlekrj
d 27 zxcvzx
Using MariaDB/MySQL, a BETWEEN query will operate row-wise over ranges, so this would work:
SELECT
ranges.group_id as group_id,
positions.position as position,
positions.annotation as annotation
FROM
(SELECT * FROM my_ranges) AS ranges, positions
WHERE
positions.position BETWEEN ranges.start AND ranges.end
That is, the query acts as if the WHERE clause is actually a series of WHERE clauses joined by "OR", one for each row of the ranges table (e.g. BETWEEN 1 AND 5 OR BETWEEN 7 and 23 OR BETWEEN 2 AND 7 OR BETWEEN 25 AND 29).
It seems like the BETWEEN operator behaves differently in presto, so the same query does not return any results.
In realty, my ranges table has ~20,000 ranges I'd like to query, so joining them by writing OR statements seems prohibitive...
Can anyone here suggest a way to modify this query (or my general approach!) to work with Presto?
(added in response to comment): For more SQL than pseudo-code, I'd like to
use tables like this:
CREATE TABLE IF NOT EXISTS `ranges` (
`group_id` char,
`start` int(3),
`end` int(3)
);
INSERT INTO `ranges` (`group_id`, `start`, `end`) VALUES
('a', '2', '5'),
('b', '7', '23'),
('c', '2', '7'),
('d', '25', '29');
CREATE TABLE IF NOT EXISTS `positions` (
`position` int(3),
`annotation` varchar(20)
);
INSERT INTO `positions` (`position`, `annotation`) VALUES
('2', 'adfkjdas'),
('3', 'sdlfkjasl;kdfj'),
('5', 'wwlekrj'),
('9', 'sdlfkdj'),
('27', 'zxcvzx');
And run queries like this:
SELECT
group_id,
position,
annotation
FROM
ranges, positions
WHERE
positions.position BETWEEN ranges.start AND ranges.end