I'm trying to move a RDBMS model over to Cassandra, and having a hard time creating the schema. Here is my data model:
CREATE TABLE Domain (
ID INT NOT NULL PRIMARY KEY,
DomainName NVARCHAR(74) NOT NULL,
HasBadWords BIT,
...
);
INSERT INTO Domain (DomainName, HasBadWords) VALUES ('domain1.com', 0);
INSERT INTO Domain (DomainName, HasBadWords) VALUES ('domain2.com', 0);
CREATE TABLE ZoneFile (
ID INT NOT NULL PRIMARY KEY,
DomainID INT NOT NULL,
Available BIT NOT NULL,
Nameservers NVARCHAR(MAX),
Timestamp DATETIME NOT NULL
);
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (1, 0, "ns1", '2010-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (2, 0, "ns1", '2010-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (1, 1, "ns2", '2011-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (2, 1, "ns2", '2011-01-01');
CREATE TABLE Backlinks (
ID INT NOT NULL PRIMARY KEY,
DomainID INT NOT NULL,
Backlinks INT NOT NULL,
Indexed INT NOT NULL,
Timestamp DATETIME NOT NULL
);
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (1, 100, 200, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (2, 300, 600, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (1, 500, 1000, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (2, 600, 1200, '2010-01-01');
From this, I've deduced that I can probably have one Keyspace: DomainData. In this keyspace, I can have a columnfamily called "Domain" which is like my Domain table in sql:
"Domain" : { //ColumnFamily
"domain1.com" : { "HasBadWords" : 0 }, //SuperColumn
"domain2.com" : { "HasBadWords" : 0 } //SuperColumn
}
The next tables are where I start getting confused. ZoneFile and Backlinks are essentially supposed to store a history of results from looking up these values for each domain. So, one Domain to Many ZoneFile records. For querying purposes, I want to be able to easily get the 'newest' ZoneFile record, or a given Domain. I will need to do the same for Backlinks.
I was considering something like this, and doing a range lookup on the key for the domain, and then getting the 'last' record which should be the newest timestamp...
"ZoneFiles" : { //ColumnFamily
"domain1.com:2010-01-01 12:00:00.000" : { "Available" : 0, "Nameservers" : "ns1" }, //SuperColumn
"domain1.com:2011-01-01 12:00:00.000" : { "Available" : 1, "Nameservers" : "ns2" }, //SuperColumn
"domain2.com:2010-01-01 12:00:00.000" : { "Available" : 0, "Nameservers" : "ns1" }, //SuperColumn
"domain2.com:2011-01-01 12:00:00.000" : { "Available" : 1, "Nameservers" : "ns2" } //SuperColumn
}
I'm not convinced this is the right answer, the combination of a string domain and string datetime in a key feels wrong. Could someone point me in the right direction?
EDIT:
Assuming I use:
"ZoneFiles" : {
"domain1.com" : {
timestamp1 : "{\"available\":1,\"nameservers\":\"ns1\"}",
timestamp2 : "{\"available\":1,\"nameservers\":\"ns1\"}",
}
}
How would I query a list of domain rows where the newest timestamp is older than a given date?