0
votes

I've a file which has 2 columns as Column1 and Column2 and holding records as below -

File in HDFS

Record 1 A is the main record and record 2 Column2 holds the information linked with A, Similarly the information with B C and D respectively. What I am looking for is to club these information and gets the following desired output.

Desired output look like I can't do any modifications in the HDFS file, anything and everything in hadoop environment only. How this can be achieved? Any help!!

1

1 Answers

0
votes

After loading the data,

A = load '' as col1,col2;

B =  FOREACH A GENERATE (col1 is null?substr(col2,1):col1),col2;