0
votes

I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like


   Name address Date
    A   addr1   20150101
    A   addr2   20150130
    B   addr1   20140325
    B   addr2   20140821
    B   addr3   20150102

I want my output like this


    Name    address Date     Rank
    A   addr1   20150101  1
    A   addr2   20150130  2
    B   addr1   20140325  1
    B   addr2   20140821  2
    B   addr3   20150102  3

I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??

1

1 Answers

1
votes

It will be little bit difficult to solve this problem using standard pig but with the help of datafu library you can easily solve this problem.

Download the jar file(datafu-1.2.0.jar) from this link http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach

input

A       addr1   20150101
A       addr2   20150130
B       addr1   20140325
B       addr2   20140821
B       addr3   20150102

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');

A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;

Output:

(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)