1
votes

I am new to pig syntax and was wondering if someone can provide a hint for translating this SQL code into pig.

SELECT column1, column2, SUM(column3)
FROM table
WHERE column5 = 100
GROUP BY column2;

So far I have:

data = LOAD....etc.
filterColumn = FILTER data BY column5 = 100;
groupColumn = Group filterColumn By column2;
result = foreach groupColumn Generate group, column1, SUM(column3) as sumCol3; 
DUMP result; 

This does not work. The error message is "Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast."

2
Can you share sample input and expected output ? - Murali Rao

2 Answers

0
votes

SUM(): Computes the sum of the numeric values in a single-column bag. It expects bag as its input. So, the FOREACH ... GENERATE would be,

result = foreach groupColumn Generate group, filterColumn.column1, SUM(filterColumn.column3) as sumCol3; 

Also in the FILTER statement, to check for equality use ==

filterColumn = FILTER data BY column5 == 100;
0
votes

The below pig commands can be used :

test=LOAD '<testdata>' USING PigStorage(',') AS (column1:<datatype>, column2:<datatype>,column3:<datatype>, column5:<datatype>);

A =FILTER test BY column5==100;

B = GROUP A BY column2;

C = FOREACH B GENERATE group, test.column1,test.column2,SUM(test.column3);

dump C;

Note that the usage of'PigStorage' and 'AS' are optional.

Hope this helps.