1
votes

What's the right way to model many-to-many relationships in Cassandra (using 3.10 at the moment)?

From what answers I was able to find, denormalization into two relationship tables is suggested (as in here, for example: Modeling many-to-many relations in Cassandra 2 with CQL3). But there are problems with that on deletes, and those answers are so sparse they do not mention any details on that.

Suppose we have the following tables:

CREATE TABLE foo (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE bar (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE foo_bar (
  foo UUID,
  bar UUID,
  PRIMARY KEY (foo, bar)
)
CREATE TABLE bar_foo (
  bar UUID,
  foo UUID,
  PRIMARY KEY (bar, foo)
)

This seems to be the suggested answer. However, what happens when we try deleting a bar record? Updating the bar_foo table is easy:

DELETE FROM bar_foo WHERE bar = <bar_key>

However, an attempt to update the foo_bar table fails:

DELETE FROM foo_bar WHERE bar = <bar_key>

with the following error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Some partition key parts are missing: foo"

This is because the primary key for the foo_bar table is (foo, bar), and we specify only the second part of the primary key in the WHERE clause of the DELETE statement. Cassandra, apparently, requires a prefix of the primary key, and bar without foo is not a prefix.

Now, changing the primary key to (bar, foo) won't help. After all, what would you do, then, if a foo record gets deleted? And, in any case, the entire purpose of the foo_bar table is to be able to find all bar records corresponding to a given foo record, and a SELECT statement also requires a prefix of the primary key in the WHERE clause (which must, by necessity, be foo).

Can't do SELECT and then DELETE, either, since a SELECT by bar won't work, it not being a prefix of the primary key.

So what to do with many-to-many relationships, when deletes are in the picture? Is this even possible to accomplish properly?

2
you can create materialized view and cassandra will take care of it... for bar_foo and foo_bar kind of scenarioundefined_variable
Oh, Cassandra 3 has materialized views. Good! Thanks.silverberry

2 Answers

0
votes
CREATE TABLE foo (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE bar (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE foo_bar (
  foo UUID,
  bar UUID,
  PRIMARY KEY (foo, bar)
)
CREATE MATERIALIZED VIEW bar_foo AS
  SELECT bar, foo FROM foo_bar
    WHERE foo IS NOT NULL AND bar IS NOT NULL
  PRIMARY KEY (bar, foo)
0
votes

Use sets. https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSet.html

CREATE TABLE foo (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE bar (
  key UUID PRIMARY KEY,
  content TEXT
)
CREATE TABLE foo_jn_bar (
  foo UUID PRIMARY KEY,
  bar set<UUID>
)
CREATE TABLE bar_jn_jn (
  bar UUID PRIMARY KEY,
  foo set<UUID>
)

If you follow the relational habit, you will have huge duplicity in disk data.