0
votes

I'm new in pig and trying understand why I can not count after join and group by:

A = LOAD 'mary' as (line);
B = LOAD 'mary' as (line);

wordsA = foreach A generate flatten(TOKENIZE(line)) as wordA;
grpdA = group wordsA by wordA;
cntdA = foreach grpdA generate group, COUNT(wordsA);

wordsB = foreach B generate flatten(TOKENIZE(line)) as wordB;
grpdB = group wordsB by wordB;
cntdB = foreach grpdB generate group, COUNT(wordsB), 'some text';

fltB = FILTER cntdB BY $1>1;

jnd = join cntdA by $1, fltB by $1;
jnd_n = foreach jnd generate $0;
grp = group jnd by $0;
out = foreach grp generate group, count(jnd_n);

dump jnd_n;
dump grp;

dump jnd_n:

(was)
(was)
(was)
(lamb)
(lamb)
(lamb)
(Mary)
(Mary)
(Mary)

dump grp:

(was,{(was,2,was,2,some text),(was,2,Mary,2,some text),(was,2,lamb,2,some text)})
(Mary,{(Mary,2,was,2,some text),(Mary,2,Mary,2,some text),(Mary,2,lamb,2,some text)})
(lamb,{(lamb,2,was,2,some text),(lamb,2,Mary,2,some text),(lamb,2,lamb,2,some text)})

But I'm getting error:

Invalid scalar projection: jnd_n : A column needs to be projected from a relation for it to be used as a scalar

If I try to change code:

out = foreach grp generate group, count(jnd_n.$0);

Then I'm getting another error:

Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve count using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

I know that I can do it another way, but I want to get result like this after exactly after doing two pig operations JOIN and GROUP BY:

dump out:

(was,3)
(lamb,3)
(Mary,3)
1

1 Answers

0
votes

COUNT needs to be in caps. COUNT is a keyword.

out = foreach grp generate group, COUNT(jnd_n.$0);`