2
votes

I am trying to install and run Apache Pig 0.15.0 on a Windows system, but with no success. I intend to use it for my Apache Hadoop 2.7.1.

Context
I have followed the basic tutorial Getting Started, the "Download Pig" section. I downloaded "pig-0.15.0" and I set up path to Pig.

I am able to enter "grunt", but when I try to run a simple script, such as:

logs = LOAD 'PigInput/logs' USING PigStorage(';');
STORE logs INTO 'logs-output.txt';

It gives me the following error:
Error

2015-07-15 12:54:27,157 [main] WARN  org.apache.pig.backend.hadoop20.PigJobControl - falling back to default JobControl (not using hadoop 0.20 ?)
java.lang.NoSuchFieldException: runnerState
        at java.lang.Class.getDeclaredField(Class.java:1953)
        at org.apache.pig.backend.hadoop20.PigJobControl.<clinit>(PigJobControl.java:51)
        at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:109)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:314)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:196)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
        at org.apache.pig.PigServer.execute(PigServer.java:1364)
        at org.apache.pig.PigServer.access$500(PigServer.java:113)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:565)
        at org.apache.pig.Main.main(Main.java:177)
2015-07-15 12:54:27,165 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-07-15 12:54:27,186 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Inst
ead, use mapreduce.reduce.markreset.buffer.percent
2015-07-15 12:54:27,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buf
fer.percent is not set, set to default 0.3
2015-07-15 12:54:27,188 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.o
utput.fileoutputformat.compress
2015-07-15 12:54:27,190 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted ru
n in-process
2015-07-15 12:54:27,585 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/p
ig-0.15.0-core-h1.jar to DistributedCache through /tmp/temp27293389/tmp1227477167/pig-0.15.0-core-h1.jar
2015-07-15 12:54:27,627 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/automaton-1.11-8.jar to DistributedCache through /tmp/temp27293389/tmp-1342585295/automaton-1.11-8.jar
2015-07-15 12:54:27,664 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp27293389/tmp-510663803/antlr-runtime-3.4.jar
2015-07-15 12:54:27,769 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/hadoop-2.7.1
/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp27293389/tmp-1466437686/guava-11.0.2.jar
2015-07-15 12:54:27,817 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/joda-time-2.5.jar to DistributedCache through /tmp/temp27293389/tmp672491704/joda-time-2.5.jar
2015-07-15 12:54:27,905 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-07-15 12:54:27,959 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for
submission.
2015-07-15 12:54:27,969 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use ma
preduce.jobtracker.http.address
2015-07-15 12:54:27,979 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-07-15 12:54:27,989 [JobControl] ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while trying to run jobs.
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:235)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:183)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
        at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
2015-07-15 12:54:28,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-07-15 12:54:28,014 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Spec
ify -stop_on_failure if you want Pig to stop immediately on failure.
2015-07-15 12:54:28,016 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop runnin
g all dependent jobs
2015-07-15 12:54:28,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-07-15 12:54:28,025 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-07-15 12:54:28,027 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.7.1   0.15.0  Administrator   2015-07-15 12:54:27     2015-07-15 12:54:28     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
N/A     logs    MAP_ONLY        Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.ma
preduce.JobContext, but class was expected
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:235)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:183)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
        at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
        hdfs://localhost:9000/user/Administrator/logs-output.txt,

What I have tried
1. I have tried to download the "pig-0.15.0-src" and tried to build it with:

ant -Dhadoopversion=23

I receive the following error (in between I also have added proxy settings to my "build.xml"):

C:\pig-0.15.0-src>ant -Dhadoopversion=23
Buildfile: C:\pig-0.15.0-src\build.xml

ivy-download:
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
      [get] To: C:\pig-0.15.0-src\ivy\ivy-2.2.0.jar
      [get] Not modified - so not downloaded

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = C:\pig-0.15.0-src\ivy\ivysettings.xml

ivy-resolve:
[ivy:resolve]
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]           ::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]           :: org.antlr#antlr;3.4: configuration not found in org.antlr#antlr;3.4: 'master'. It was required from org.apache.pig#pig;0.15
.0-SNAPSHOT compile
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]
[ivy:resolve]
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
C:\pig-0.15.0-src\build.xml:1662: impossible to resolve dependencies:
        resolve failed - see output for details
  1. I have download from Maven the jar: org.apache.pig pig 0.15.0 h2.jar. I do not know whether this is helpful or not. I also do not know where to put it.

One more detail At the beginning, when I ran "pig", it gave me path not found to "hadoop-config.cmd". To make it work, I changed the following line "pig/bin/pig.cmd":

set hadoop-config-script=C:\hadoop-2.7.1\libexec\hadoop-config.cmd

Other ppl, similar problem
I saw this similar problem: here and others. Most of them suggest to run stuff like:

ant clean jar-withouthadoop -Dhadoopversion=23

...which, at best, throws me towards other errors.

Help
I need help making my Apache Pig running commands and MapReduce jobs. What should I do?

UPDATE 1 As recommended by @Fred in comments, I have tried and succeeded to make pig 0.12.0 running jobs with no hustling (I only set the path, no builds, etc.), other than finding this version on the Internet: cloudera pig 0.12.0.

Still, I would like to find a solution to have the latest version of Apache Pig.

1
Although Pig might work somehow on Windows, I suggest to use a VM. Is that possible or do you have a requirement to stay on Windows? - Frederic
I have a requirement to be Windows. Everything should be on Windows :) - Cristian Ciobotea
in that case i would try pig 0.12, which is explicitly said to be running on windows: issues.apache.org/jira/browse/PIG-2793 and hortonworks.com/blog/announcing-apache-pig-0-12 - Frederic
I would strongly recommend, as @Fred has stated in his comments, to do this on a VM in a Linux environment. It will save you a ton of headache (Windows is a disaster) and if you are trying to learn the Hadoop ecosystem, it provides you experience of how you would use it in the "real world". - gobrewers14

1 Answers

0
votes

There are two pig jars built from the same source. If you are using hadoop > 2.3. include

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <classifier>h2</classifier>
    <version>0.15.0</version>
</dependency>

instead of

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <version>0.15.0</version>
</dependency>

as dependency and try.

0.15.0 h2.jar is built using ant -Dhadoopversion=23 jar.