0
votes

I have 2 files and i want to generate data using different columns of diff files. I want to do something like this:-

Here is my problem with example:-

I have 2 files abc.txt(col1,col2) and xyz.txt(col3,col4) Number of records in both the files differ say abc.txt has 1000 records and xyz.txt has 100 records. I want to store output in a file such that , i get col1,col2 from abc.txt and col3 from xyz.txt (as we have less records in xyz then abc i want my col3 values to get repeated either randomly or in same sequence as in input file anything is ok)

Input
abc.txt           xyz.txt
col1 col2        col3  col4
 1     A           4      X
 2     B           5      Y
 3     C           6      Z
 4     D
 5     D
 6     F
 7     A

A = LOAD '/user/abc.txt' Using PigStorage('|'); 
B = LOAD '/user/xyz.txt' Using PigStorage('|'); 
C = FOREACH A GENERATE A.$0,A.$1,B.$0;

Output
col1 col2 col3
 1     A    4
 2     B    5
 3     C    6
 4     D    5
 5     D    4
 6     F    4
 7     A    6

Is it possible to do this using PIG?

1

1 Answers

0
votes

GENERATE is not operator in Pig. So you cannot use it to generate data. Pig provides FOREACH for iterating over a relation. It works for one relation only. To me it looks like you can generate the data as you have specified in question until you want to perform some sort of JOIN on data.