Although this is an old question, it appears in Google search results pretty high. So I wanted to give an update.
Cassandra 2.2+ supports user defined function and user defined aggregates. WARNING: this does not mean that you don't have to do data modeling anymore (as it was pointed by @Theo) rather it just allows you to slightly preprocess your data upon retrieval.
SELECT DISTINCT (a2) FROM demo2 where b2='sea'
To implement DISTINCT
, you should define a function and an agreggate. I'll call both the function and the aggregate uniq
rather than distinct
to emphasize the fact that it is user defined.
CREATE OR REPLACE FUNCTION uniq(state set<text>, val text)
CALLED ON NULL INPUT RETURNS set<text> LANGUAGE java
AS 'state.add(val); return state;';
CREATE OR REPLACE AGGREGATE uniq(text)
SFUNC uniq STYPE set<text> INITCOND {};
Then you use it as follows:
SELECT uniq(a2) FROM demo2 where b2='sea';
SELECT sum(a3), sum(b3) from demo3 where c3='water' and d3='ocean'
SUM
is provided out of the box and works as you would expect. See system.sum
.
SELECT a1,MAX(b1) FROM demo1 group by a1
GROUP BY
is a tricky one. Actually, there is no way to group result rows by some column. But what you can do is to create a map<text, int>
and to group them manually in the map. Based on an example from Christopher Batey's blog, group-by and max:
CREATE OR REPLACE FUNCTION state_group_and_max(state map<text, int>, type text, amount int)
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer val = (Integer) state.get(type);
if (val == null) val = amount; else val = Math.max(val, amount);
state.put(type, val);
return state;
' ;
CREATE OR REPLACE AGGREGATE state_group_and_max(text, int)
SFUNC state_group_and_max
STYPE map<text, int>
INITCOND {};
Then you use it as follows:
SELECT state_group_and_max(a1, b1) FROM demo1;
Notes
- As it was mentioned above, you still have to invest some time in data modeling, don't overuse these features
- You have to set
enable_user_defined_functions=true
in your cassandra.yaml
to enable the features
- You can overload the functions to support grouping by columns of different types.
References: