0
votes

I have a simple column family in my Cassandra KeySpace. I need to access it using PIG. Please help me understand how this works:

SD = LOAD 'cassandra://SampleData/Queries' USING CassandraStorage() as (f1,f2,f3);

If I perform

X  = foreach SD generate f1; dump X;

it gives me all the keys stored in the `Queries'' table. I need to be able to generate a couple(key, value)' where key' is a row key andvalue' is the value of a column by this key with a specific name `UpdateTimeStamp'

I figured out that if I do

Y = foreach SD generate f2.name; dump Y;

It goes through the list of all rows and prints the list of NAMEs of columns present in each row

if I do

Z = foreach SD generate f2.value; dump Z;

it gives me the same thing as above except instead of column names, it prints column values.

I need to be able to generate a relation (key, timestamp) something like this:

T = foreach SD generate (f1, f2.value(for f2.name == 'UpdateTimeStamp'));

Obviously, PIG won't take the statement above;

1

1 Answers

0
votes

The columns in cassandra are loaded in the PIG as a inner bag of tuples.

Try this

data = LOAD 'cassandra://SampleData/Queries' USING CassandraStorage()
   AS (keycolumn, columns: bag {T: tuple(columnname, columnvalue)});

dump data; // check for what is in the data alias

data2 = FOREACH data GENERATE keycolumn, columns.name);

dump data2;