0
votes

I have the following Cassandra table:

CREATE TABLE segments (
  b text,
  s int,
  c int,
  PRIMARY KEY (b)
)

and the following Pig relation:

data: {b: chararray,s: long,c: long}

which I am loading from a file stored in PigStorage

data = LOAD 'some_file' as (b:chararray,s:long,c:long);

I am trying to store the Pig relation into the Cassandra table unsuccessfully. I tried:

to_cassandra = FOREACH (GROUP data ALL) 
  GENERATE 
    TOTUPLE(TOTUPLE('b',data.b)),
    TOTUPLE('s',data.s),
    TOTUPLE('c',data.c);
STORE to_cassandra INTO 
  'cql://pv/segments?
    output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
  USING CqlStorage();

where the decoded output query is:

UPDATE pv.segments SET s=?,c=?

but I get the following:

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - 
  ERROR: java.lang.ClassCastException: 
    org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray

which is kind of cryptic. Which one is the offending field? How do I fix this?

EDIT

I ran illustrate to_cassandra; and got:

-----------------------------------------------------------------------------------------------------
| data     | b:chararray                                                  | s:long     | c:long     | 
-----------------------------------------------------------------------------------------------------
|          | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1          | 1          | 
|          | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1          | 1          | 
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3     | group:chararray     | data:bag{:tuple(b:chararray,s:long,c:long)}                                                                                                  | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         | all                 | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra     | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)}))                         | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)})                     | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)})                     | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)}))                                          | (s, {(1), (1)})                                                                              | (c, {(1), (1)})                                                                              | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1

1 Answers

0
votes

You have an issue with your grouping, since it's producing arrays for each field instead of individual values, which is what Cassandra expects. Your output should ultimately look like:

((b, 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB)), (s, 1), (c, 1)

... in order to match your schema. Since your output schema directly matches your input, the purpose of the grouping is unclear.