0
votes

I am currently working on a recommender application and I am using cassandra with hadoop and pig for map/reduce jobs. To take advantage of the column names properties our team has decided to store data using valueless columns and aggregate column names so for example all hits for a specific content are stored in a column family with a single row, and each column is a hit for the content using the following structure:

rowkey = 'single_row' {
    id_content:hit_date, -
    .
    .
    .
}

With this schema we obtain wide rows instead of skinny; the question is, how do i need to manipulate data in Pig in order to store data in cassandra with this schema?

1

1 Answers

0
votes

I'm not sure from your comment if you're using composite columns, or whether you're just concatenating id_content and hit_date.

For normal (i.e. non-composite) columns, the schema is:

(key, {(col_name, col_value), ...})

In the case of composite columns, I believe the schema is the following:

(key, {((col_name_part_1, col_name_part_2), col_value), ...})

This assessment (for composite columns) is based on reading the patch submitted on https://issues.apache.org/jira/browse/CASSANDRA-3684