3
votes

I'm trying the quickstart from here: http://datafu.incubator.apache.org/docs/datafu/getting-started.html I tried nearly everything, but I'm sure it must be my fault somewhere. I tried already:

  • exporting PIG_HOME, CLASSPATH, PIG_CLASSPATH
  • starting pig with -cpdatafu-pig-incubating-1.3.0.jar
  • registering datafu-pig-incubating-1.3.0.jar locally and in hdfs => both succesful (at least no error shown) nothing helped

Trying this on pig:

register datafu-pig-incubating-1.3.0.jar
DEFINE Median datafu.pig.stats.StreamingMedian();
data = load '/user/hduser/numbers.txt' using PigStorage() as (val:int);
data2 = FOREACH (GROUP data ALL) GENERATE Median(data);

or directly

data2 = FOREACH (GROUP data ALL) GENERATE datafu.pig.stats.StreamingMedian(data);

I get this name-resolve error:

2016-06-04 17:22:22,734 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve datafu.pig.stats.StreamingMedian using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /home/hadoop/pig_1465053680252.log

When I look into the datafu-pig-incubating-1.3.0.jar it looks OK, everything in place. I also tried some Bag functions, same error then. I think it's kind of a noob-error which I just don't see (as I did not find particular answers for datafu in SO or google), so thanks in advance for shedding some light on this.

1
Please consider editing your question title and body. Try to simplify things - Pmpr
Corrected the formatting now, sorry for that - Christof Kälin
Just to confirm: if you use basic pig functions (like SUM) everything works, and if you use any datafu function nothing works? - Dennis Jaheruddin
A long shot, but could you try starting with org.apache.datafu or org.apache.pig.datafu . Also, does it help if you run pig in local mode? And obviously: what is in the referred log file? - Dennis Jaheruddin
Have you tried an absolute path to your jar file? Also, check the jar locally, for the folder structure (i.e. rename it to .zip and decompress it), see if it matches your path to the StreamingMedian class. - Robin Trietsch

1 Answers

0
votes

Pig script is proper, the only thing that could break is that while registering datafu there were some class dependencies that coudn't been met.

Try to run locally (pig -x local) and see a detailed log.

Check also the version of pig - it should be newer than 0.14.0.