1
votes

I am trying to copy data from a file in HDFS to a table in Cassandra using Pig. But the job fails with null pointer exception while storing the data in Cassandra. Can someone help me with this?

Users table structure:

CREATE TABLE users ( user_id text PRIMARY KEY, age int, first text, last text )

My pig script

  1. A = load '/user/hduser/user.txt' using PigStorage(',') as (id:chararray,age:int,fname:chararray,lname:chararray);

  2. C = foreach A GENERATE TOTUPLE(TOTUPLE('user_id',id)), TOTUPLE('age',age),TOTUPLE('first',fname),TOTUPLE('last',lname);

  3. STORE C into 'cql://ram_keyspace/users' USING CqlStorage();

Exception:

java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:123) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:90) at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:76) at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:57) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:627) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NullPointerException at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:109) ... 12 more

Can someone who has used Pig with Cassandra help me fix this?

1
What version of Cassandra?psanford
Cassandra version is 1.2.13user3207663

1 Answers

0
votes

You are using CqlStorage which requires you to specify the output_query which is a prepared statement that will be used to insert the data into the column family. The DSE pig documentation provides an example:

grunt> STORE insertformat INTO
   'cql://cql3ks/simple_table1?output_query=UPDATE+cql3ks.simple_table1+set+b+%3D+%3F'
   USING CqlStorage;