please help me out..its really urgent..deadline nearing, and im stuck with it since 2 weeks..breaking my head but no result. i am a newbie in piglatin. i have a scenario where i have to filter data from a csv file. the csv is on hdfs, and has two columns.
grunt>> fl = load '/user/hduser/file.csv' USING PigStorage(',') AS (conv:chararray, clnt:chararray);
grunt>> dump f1;
("first~584544fddf~dssfdf","2001")
("first~4332990~fgdfs4s","2001")
("second~232434334~fgvfd4","1000")
("second~786765~dgbhgdf","1000)
("second~345643~gfdgd43","1000")
what i need to do is i need to extract only the first word before the 1st '~' sign and concat that with the second column value of the csv file. Also i need to group the concatenated result returned and count the number of such similar rows, and create a new csv file as out put, where there would be 2 columns again. 1st column would be the concatenated value and the 2nd column would be the row count. i.e
("first 2001","2")
("second 1000","3")
and so on.
I have written the code here but its just not working. i have used STRSPLIT. it is splitting the values of the first column of input csv file. but i dont know how to extract the first split value. code is given below:
convData = LOAD '/user/hduser/file.csv' USING PigStorage(',') AS (conv:chararray, clnt:chararray);
fil = FILTER convData BY conv != '"-1"'; --im using this to filter out the rows that has 1st column as "-1".
data = FOREACH fil GENERATE STRSPLIT($0, '~');
X = FOREACH data GENERATE CONCAT(data.$0,' ',convData.clnt);
Y = FOREACH X GROUP BY X;
Z = FOREACH Y GENERATE COUNT(Y);
var = FOREACH Z GENERATE CONCAT(Y,',',Z);
STORE var INTO '/user/hduser/output.csv' USING PigStorage(',');
:). - halfer