I have 2 files and i want to generate data using different columns of diff files. I want to do something like this:-
Here is my problem with example:-
I have 2 files abc.txt(col1,col2) and xyz.txt(col3,col4) Number of records in both the files differ say abc.txt has 1000 records and xyz.txt has 100 records. I want to store output in a file such that , i get col1,col2 from abc.txt and col3 from xyz.txt (as we have less records in xyz then abc i want my col3 values to get repeated either randomly or in same sequence as in input file anything is ok)
Input
abc.txt xyz.txt
col1 col2 col3 col4
1 A 4 X
2 B 5 Y
3 C 6 Z
4 D
5 D
6 F
7 A
A = LOAD '/user/abc.txt' Using PigStorage('|');
B = LOAD '/user/xyz.txt' Using PigStorage('|');
C = FOREACH A GENERATE A.$0,A.$1,B.$0;
Output
col1 col2 col3
1 A 4
2 B 5
3 C 6
4 D 5
5 D 4
6 F 4
7 A 6
Is it possible to do this using PIG?