2
votes

I have asked this question on jackrabbit-users list but I didn't get an answer.

JCR-SQL2 doesn't provide a SELECT DISTINCT (or similar, AFAIK). Neither do SQL or XPATH in JCR 1.0... How are people getting around this?. What's the best way of removing duplicate rows?

I read that someone was iterating over the results and putting them in a Set. In my case, because of the possible huge number of results, that approach may end up being too costly.

Does anyone here have a suggestion?

1

1 Answers

2
votes

None of the query languages defined in JCR 1.0 (e.g., JSR-170) or JCR 2.0 (e.g., JSR-283) have a notion of SELECT DISTINCT.

The only way to do this is to process the results manually and throwing out any rows (or nodes) that you've already seen. Using a set of paths or Node objects would work. This isn't too difficult, but it's unfortunately harder than it should be and, as you mention, can be expensive if there are a lot of rows and/or duplicates.

This is why ModeShape provides full support for JCR-SQL2 queries but also allows use of SELECT DISTINCT. In fact, ModeShape supports a number of other features such as

  • non-correlated subqueries in the WHERE clause
  • LIMIT n and OFFSET m
  • UNION, INTERSECT and EXCEPT
  • FULL OUTER JOIN and CROSS JOIN
  • BETWEEN criteria
  • set criteria, using IN and NOT IN
  • DEPTH and PATH dynamic operands

and a few others. For details, see the documentation.