I'm trying to append one dataset to another one in Apache Pig. There were several examples but I think that different than my problem.
Here is my pig script:
line1 = load 'line1/points' using Table();
line20 = load 'line20/points' using Table();
DESCRIBE line1;
DUMP line1;
DESCRIBE line20;
DUMP line20;
X = UNION line1, line20;
DESCRIBE X;
DUMP X;
I get this:
line1: {key: bytearray,y: (name: chararray,value: long),x: (name: chararray,value: long),columns: {(name: chararray,value: bytearray)}}
(ab48a8567d58cfea52905db0e94d88d3,(y,3),(x,3))
(ab48a8567d58cfea52905db0e94d88d3,(y,1),(x,1))
(ab48a8567d58cfea52905db0e94d88d3,(y,2),(x,2))
line20: {key: bytearray,y: (name: chararray,value: long),x: (name: chararray,value: long),columns: {(name: chararray,value: bytearray)}}
(203146881b7ef0d26902ea440e734b79,(y,20),(x,20))
(203146881b7ef0d26902ea440e734b79,(y,21),(x,21))
(203146881b7ef0d26902ea440e734b79,(y,22),(x,22))
X: {key: bytearray,y: (name: chararray,value: long),x: (name: chararray,value: long),columns: {(name: chararray,value: bytearray)}}
(203146881b7ef0d26902ea440e734b79,(y,21),(x,21))
(203146881b7ef0d26902ea440e734b79,(y,22),(x,22))
(203146881b7ef0d26902ea440e734b79,(y,20),(x,20))
(203146881b7ef0d26902ea440e734b79,(y,20),(x,20))
(203146881b7ef0d26902ea440e734b79,(y,21),(x,21))
(203146881b7ef0d26902ea440e734b79,(y,22),(x,22))
The result is just a double copy of the 'line20' dataset. Why?
I would like to have values from 'line1' and then values from 'line20'.
BTW: ... using Table(); - this is just my implementation of CassandraStorage, where I provide automatically types for columns.
Thanks for your help!
Solution
Configuration
is shared. I forgot about it and I was using for both Table()
instances the same ID to initialize them.
UNION
on two files loaded with... PigStorage(',');
works fine. I have just checked. No, these twoTable()
invocations do not overlap each other. However, I will search for a problem in myTable()
class. Thanks. – ahypkiTable()
instances were indeed overlapping. @Pradeep Gollakota pointed out that Configuration object is shared. That was my mistake. Thank you for your help. – ahypki