0
votes

We have the following sample data which has to be transformed into a output format using pig script

<< Sample TSV >>

Id    rank  Value

12324 1     1582

12324 2     1142

12324 4     1292

12324 5     1134

12325 1     1582

12325 2     1142

12325 3     1292

12325 4     1134

12325 5     1183

12326 1     1582

12326 2     1142

12326 3     1292

12326 4     1134

12326 5     1183

We need to compare the values (of the value column) per rank for each id.

The output needs to be generated in the following format

Id1                   Id2

value_rank1           value_rank1

value_rank2           value_rank2

value_rank3           value_rank3

...                   ........

value_rankn           value_rankn

For e.g.

12324     12325   ..

1582       1582

1142       1142

        1292

1292       1134

1134       1183

There has to be a blank value for any missing rank for a particular id.

Is there any way to achieve this in using pig script?

1

1 Answers

0
votes

Pig manipulate data by record (row based). After the ETL operation, it produce row based record for most cases.

For your requirement, I think it's possible that use an UDF(to generate the placeholder) to generate things like:

12324, 1582, 1142, , 1292, 1134
12325, 1582, 1142, 1292, 1134, 1183

And then transpose the data from row based to column based in other software (for example, use "paste special -> transpose" in Excel).