I'm trying to make a meeting/attendee program while learning HBase.
I'm debating about whether to denormalize a relationalship from 3rd Normal Form to either 1st Normal Form (atomocity/no sets of values within a column) to 0 Normal Form (not atomic/sets of values exist within one column).
In essence I'm trying to determine how to convert the following relational schema into HBase:
CREATE TABLE customer (
customer_id INT PRIMARY KEY
,capacity INT
);
CREATE TABLE attendee (
attendee_id INT PRIMARY KEY
customer_id INT REFERENCES customer (customer_id)
);
CREATE TABLE customer_dedicated_hosts (
customer_id INT REFERENCS customer (customer_id)
,dedicated_host_attendee_id INT REFERENCES attendee (attendee_id)
);
CREATE TABLE meeting (
meeting_id INT PRIMARY KEY
,host_attendee_id INT REFERENCES attendee (attendee_id)
);
CREATE TABLE meeting_attendee (
meeting_id INT
,attendee_id INT
,CONSTRAINT ... PRIAMRY KEY (meeting_id, attendee_id)
);
A "Customer" has 1:M Attendees.
An Attendee has M:N Meetings.
A Meeting is hosted by an attendee and thus links to Attendee via the host_attendee_id FK.
A Customer has a number of attendees who are allowed to host meetings -- listed in CustomerDedicatedHosts. If an attendee of a customer hosts meetings who is not a dedicated host, he should be fined.
Each meeting has a capacity of attendees for a given customer. If a customer goes over his capacity for a single meeting, he should be fined.
I'm curious if this should all be done in one or two tables with one column family -- denormalized table with tons of repetition. The equivalent of
CREATE TABLE hostapp (
customer_id INT
,capacity
,dedicated_host_attendee_id
--ROWKEY == customer_id, dedicated_host_attendee_id
);
CREATE TABLE meetingapp (
customer_id INT
,attendee_id INT
,meeting_id INT
,host_attendee_id INT
--ROWKEY == customer_id, meeting_id, attendee_id
);
I can't quite wrap my head around denormalization in this scenario. Why not split the "hostapp" into two tables, one with two columns (customer_id, capacity), and one with two columns (customer_id, dedicated_host_attendee_id). I guess I could live with the repeating host_attendee_id in the meetingapp table, but why not split the meeting app into two tables (customer_id, meeting_id, host_attendee_id) and (meeting_id, attendee_id)?
Is that the proper way to design this schema, or should it be done differently?
I'm also curious as to what extend I can abuse the columns in a column family, to use them like a nested table in oracle.
CREATE TABLE meetingapp (
customer_id INT
meeting_id INT
host_attendee_id INT
attendees VARRAY(<INT>)
);
In hbase terminology, one column family would always have the following three columns: customer_id, meeting_id, host_attendee_id. Either the same or another column family would have these columns: attendee1, attendee2, ... attendeeN; in other words, variable number of columns depending on the number of column families, similar to a nested table or varray in Oracle.
How is this best approached?