JackRabbit - Removal of duplicate rows

Question

I have asked this question on jackrabbit-users list but I didn't get an answer.

JCR-SQL2 doesn't provide a SELECT DISTINCT (or similar, AFAIK). Neither do SQL or XPATH in JCR 1.0... How are people getting around this?. What's the best way of removing duplicate rows?

I read that someone was iterating over the results and putting them in a Set. In my case, because of the possible huge number of results, that approach may end up being too costly.

Does anyone here have a suggestion?

Randall Hauch Randall Hauch · Accepted Answer · 2011-03-04T16:18:16

None of the query languages defined in JCR 1.0 (e.g., JSR-170) or JCR 2.0 (e.g., JSR-283) have a notion of SELECT DISTINCT.

The only way to do this is to process the results manually and throwing out any rows (or nodes) that you've already seen. Using a set of paths or Node objects would work. This isn't too difficult, but it's unfortunately harder than it should be and, as you mention, can be expensive if there are a lot of rows and/or duplicates.

This is why ModeShape provides full support for JCR-SQL2 queries but also allows use of SELECT DISTINCT. In fact, ModeShape supports a number of other features such as

non-correlated subqueries in the WHERE clause
LIMIT n and OFFSET m
UNION, INTERSECT and EXCEPT
FULL OUTER JOIN and CROSS JOIN
BETWEEN criteria
set criteria, using IN and NOT IN
DEPTH and PATH dynamic operands

and a few others. For details, see the documentation.

JackRabbit - Removal of duplicate rows

1 Answers