1
votes

When trying to run PIG against a CQL3 created Cassandra Schema,

-- This script simply gets a row count of the given column family  
rows = LOAD 'cassandra://Keyspace1/ColumnFamily/' USING CassandraStorage();
counted = foreach (group rows all) generate COUNT($1);
dump counted;

I get the following Error.

Error: Column family 'ColumnFamily' not found in keyspace 'KeySpace1'

I understand that this is by design, but I have been having trouble finding the correct method to load CQL3 tables into PIG.

Can someone point me in the right direction? Is there a missing bit of documentation?

5
Updated original post - e90jimmy

5 Answers

1
votes

This is now supported in Cassandra 1.2.8

0
votes

As you mention this is by design because if thrift was updated to allow for this it would compromise backwards computability. Instead of creating keyspaces and column families using CQL (I'm guessing you used cqlsh) try using the C* CLI.

Take a look at these issues as well:

0
votes

Per this https://github.com/alexliu68/cassandra/pull/3, it appears that this fix is planned for the 1.2.6 release of Cassandra. It sounds like they're trying to get that out in the reasonably near future, but of course there's no certain ETA.

0
votes

As e90jimmy said, its supported in Cassandra 1.2.8, but we have a issue when using counter column type. This was fixed by Alex Liu but due to regression problem in 1.2.7 the patch doesn't go ahead:

https://issues.apache.org/jira/browse/CASSANDRA-5234

To correct this, wait until 2.0 become production ready or download the source, apply the patch from the above link by yourself and rebuild the cassandra .jar. Worked for me by now...

0
votes

The best way to access Cql3 Tables in Pig is by using the CqlStorage Handler

The syntax is similar to what you have a above

row = Load 'cql://Keyspace/ColumnFamily/' Using CqlStorage()

More info In the Dev Blog Post