0
votes

I am trying to run a test pig script in PIG to load data from cassandra at Datastax enterprise, but I am getting an error. Let me show de whole scenario:

Cassandra Schema: CREATE KEYSPACE libdata WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };

CREATE TABLE libout ("STABR" TEXT, "FSCSKEY" TEXT, "FSCS_SEQ" TEXT, "LIBID" TEXT, "LIBNAME" TEXT, "ADDRESS" TEXT, "CITY" TEXT, "ZIP" TEXT, "ZIP4" TEXT, "CNTY" TEXT, "PHONE" TEXT, "C_OUT_TY" TEXT, "C_MSA" TEXT, "SQ_FEET" INT, "F_SQ_FT" TEXT, "L_NUM_BM" INT, "F_BKMOB" TEXT, "HOURS" INT, "F_HOURS" TEXT, "WKS_OPEN" INT, "F_WKSOPN" TEXT, "YR_SUB" INT, "STATSTRU" INT, "STATNAME" INT, "STATADDR" INT, "LONGITUD" FLOAT, "LATITUDE" FLOAT, "FIPSST" INT, "FIPSCO" INT, "FIPSPLAC" INT, "CNTYPOP" INT, "LOCALE" TEXT, "CENTRACT" FLOAT, "CENBLOCK" INT, "CDCODE" TEXT, "MAT_CENT" TEXT, "MAT_TYPE" INT, "CBSA" INT, "MICROF" TEXT, PRIMARY KEY ("FSCSKEY", "FSCS_SEQ"));

cqlsh:libdata> CREATE TABLE libsqft ( year INT, state TEXT, sqft BIGINT, PRIMARY KEY (year, state) ); The second table is going to be used to store data from pig to cassandra.

At PIG GRUNT: grunt> libdata = LOAD 'cql://libdata/libout' USING CqlStorage(); grunt> dump libdata;

This is my output:

2014-08-18 23:02:11,603 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2014-08-18 23:02:11,607 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-08-18 23:02:11,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-08-18 23:02:11,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-08-18 23:02:11,613 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2014-08-18 23:02:11,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-08-18 23:02:11,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5135328249315577655.jar 2014-08-18 23:02:14,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5135328249315577655.jar created 2014-08-18 23:02:14,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2014-08-18 23:02:14,400 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2014-08-18 23:02:14,783 [Thread-12] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2014-08-18 23:02:14,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2014-08-18 23:02:15,439 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408182033_0011 2014-08-18 23:02:15,439 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://IP:50030/jobdetails.jsp?jobid=job_201408182033_0011 2014-08-18 23:03:00,167 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201408182033_0011 has failed! Stop running all dependent jobs 2014-08-18 23:03:00,167 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2014-08-18 23:03:00,169 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - There is no log file to write to. 2014-08-18 23:03:00,169 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Backend error message java.lang.RuntimeException at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:657) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:301) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53662) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53630) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:53545) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1820) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1805) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) ... 11 more

2014-08-18 23:03:00,173 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:657) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:301) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53662) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53630) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:53545) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1820) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1805) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) ... 11 more

2014-08-18 23:03:00,173 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2014-08-18 23:03:00,174 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.4.13 0.10.1 ubuntu 2014-08-18 23:02:11 2014-08-18 23:03:00 UNKNOWN

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_201408182033_0011 libdata MAP_ONLY Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201408182033_0011_m_000000 cfs://10.82.31.13/tmp/temp-1734707970/tmp1694465949,

Input(s): Failed to read data from "cql://libdata/libout"

Output(s): Failed to produce result in "cfs://10.82.31.13/tmp/temp-1734707970/tmp1694465949"

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_201408182033_0011

2014-08-18 23:03:00,174 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-08-18 23:03:00,215 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:657) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:301) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53662) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53630) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:53545) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1820) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1805) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) ... 11 more

2014-08-18 23:03:00,215 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2014-08-18 23:03:00,215 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias libdata. Backend error : Unable to recreate exception from backed error: java.lang.RuntimeException at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:657) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:301) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53662) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53630) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:53545) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1820) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1805) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) ... 11 more

    at org.apache.pig.PigServer.openIterator(PigServer.java:856)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:683)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:490)
    at org.apache.pig.Main.main(Main.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:657) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.(CqlPagingRecordReader.java:301) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53662) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:53630) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:53545) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1820) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1805) at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635) ... 11 more

    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:149)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:383)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1279)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1264)
    at org.apache.pig.PigServer.storeEx(PigServer.java:961)
    at org.apache.pig.PigServer.store(PigServer.java:928)
    at org.apache.pig.PigServer.openIterator(PigServer.java:841)
    ... 12 more

It seems like Pig can't read data from cassandra. Does anyone have idea of what's going on?

Thanks a lot.

Bruno

4
For people who found this post when looking for ERROR 1066: Unable to open iterator for alias here is a generic solution.Dennis Jaheruddin

4 Answers

3
votes

The good news is that the exact sequence of steps that you provided works correctly on a fresh install of a single node DSE 4.5.1 cluster.

The logs don't indicate any problem other than a "UnavailableException", which means that the query is making it to Cassandra, and Cassandra doesn't think that enough nodes are up fulfill your request.

With RF=1, the implication is that if any node in your cluster is down, then you will hit that exception for some portion of your data set.

My reccomendation would be to

  1. Try this again on a single node with DSE 4.5.1 installed.
  2. Assuming that works, double check the topology and status of your cluster "nodetool ring", or opscenter's ring view would both be quite helpful for this.

The Cassandra logs should also show this UnavailableEXception, and possibly point you more directly at the underlying source of the problem.

I don't see any indication that this problem is specific to the configuration or use of Pig itself.

1
votes

How are you running pig? I'm asking because of:

WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2014-08-18 23:03:00,215 [main]

Usually there should be a log file in your dse install directory when you run:

/bin/dse pig

1
votes

Actually Sebastian, I have done a confusion. I was trying to read data from my cassandra cluster, but not using the right Rack name. I have a cluster with 3 racks: Solr Analytics (Hadoop) and Cassandra

The table that I created was using the rack name "Solr" and not Analytics. Now it is solved.

Thanks for all the help.