0
votes

I am using Cassandra-1.2 with patch 5234 - Table created through CQL3 are not accessible to Pig Hadoop - 1.1.2 pig 0.11.1

I have a table in Cassandra

datatypetest (num int PRIMARY KEY, ascii ascii, blob blob, text text, varnum varint);

and the test data in datatypetest is

 num | ascii | blob   | text | varnum
-----+-------+--------+--------+------
  13 |   126 | 0x0003 | John | null

I ran the following PIG Script

test1 = LOAD 'cassandra://keyspace1/datatypetest' USING CassandraStorage() AS 
(num:int, columns: bag {T: tuple(name, value)});

And the output is as follows in the alias test1

(12,{((),),((ascii),125),((blob),��),((text),deepak)})

As you can see in the output, it is not in the following format

(<row_key>,{(<column_name1>,<value1>),(<column_name2>,<value2>)})

The inner bag has tuple which has another inner tuple and the first inner tuple which I presume is the key is empty.

I cannot use columns.ascii or columns.blob or columns.text to access the columns tuples like below and get an exception

test2 = FOREACH test1 GENERATE num, columns.text;
2013-07-29 09:11:58,488 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
ERROR 1200: Pig script failed to parse: 
<line 3, column 8> pig script failed to validate:    
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: 
Cannot find field text in name:tuple(),value:bytearray

How can I access the column tuples. Thanks in advance.

1

1 Answers

0
votes

You should not use CassandraStorage when referencing tables created using CQL3. CassandraStorage is analogous to the Thrift API. When accessing CQL3 tables, use CqlStorage:

test1 = LOAD 'cql://keyspace1/datatypetest' USING CqlStorage();

This should give you name/value tuples for the columns and their contents. The response should look something like this:

((name,13),(ascii,126),(blob,"blobvalue"),(text,John))

However, there does seem to be a mismatch between the returned collection and the schema that CqlStorage generates. (See this question.)