13
votes

I know that there is a very similar post to this one(Failed to locate the winutils binary in the hadoop binary path), however, I have tried every step that was suggested and the same error still appears.

I'm trying to use the Apache Spark version 1.6.0 on Windows 7 to perform the tutorial on this page http://spark.apache.org/docs/latest/streaming-programming-guide.html, specifically using this code:

./bin/run-example streaming.JavaNetworkWordCount localhost 9999

However, this error keeps appearing: enter image description here

After reading this post Failed to locate the winutils binary in the hadoop binary path

I realized I needed the winutils.exe file, so I have downloaded a hadoop binary 2.6.0 with it, defined an Environment Variable called HADOOP_HOME:

 with value C:\Users\GERAL\Desktop\hadoop-2.6.0\bin  

and placed it on Path like this: %HADOOP_HOME%

Yet the same error still appears when I try the code. Does anyone know how to solve this?

6
Shouldn't you be doing HADOOP_HOME=C:\Users\GERAL\Desktop\hadoop-2.6.0 and add %HADOOP_HOME%\bin; to PATH variablejdprasad
@JD_247 didn't work, thanks anywaymanuel mourato
@JD_247 Your comment worked like a charm for me. :)Pragyaditya Das

6 Answers

22
votes

If you are running Spark on Windows with Hadoop, then you need to ensure your windows hadoop installation is properly installed. to run spark you need to have winutils.exe and winutils.dll in your hadoop home directory bin folder.

I would ask you to try this first:

1) You can download .dll and .exe fils from the bundle in below link.

https://codeload.github.com/sardetushar/hadooponwindows/zip/master

2) Copy winutils.exe and winutils.dll from that folder to your $HADOOP_HOME/bin.

3) Set the HADOOP_HOME either in your spark-env.sh or at the command, and add HADOOP_HOME/bin to PATH.

and then try running.

If you need any assistance for hadoop installation help, there is a nice link, you can try it.

http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/

But, that can wait. you can try the first few steps.

3
votes

Download the bin file from here Hadoop Bin then System.setProperty("hadoop.home.dir", "Desktop\bin");

2
votes

Install JDK 1.8, Download Spark Binary from Apache Spark & Winutils from Git repo

Set the user variables path for JDK, Spark binary, Winutils

JAVA_HOME
C:\Program Files\Java\jdk1.8.0_73

HADOOP_HOME
C:\Hadoop

SPARK_HOME
C:\spark-2.3.1-bin-hadoop2.7

PATH
C:\Program Files\Java\jdk1.8.0_73\bin;%HADOOP_HOME%\bin;%SPARK_HOME%\bin;

Open command prompt and run spark-shell

Spark Shell

1
votes

you can try set the HADOOP_HOME environment variable to:

C:\Users\GERAL\Desktop\hadoop-2.6.0

instead of

C:\Users\GERAL\Desktop\hadoop-2.6.0\bin  
1
votes

The following error is due to missing winutils binary in the classpath while running Spark application. Winutils is a part of Hadoop ecosystem and is not included in Spark. The actual functionality of your application may run correctly even after the exception is thrown. But it is better to have it in place to avoid unnecessary problems. In order to avoid error, download winutils.exe binary and add the same to the classpath.

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class SparkTestApp{

    public static void main(String[] args) {

            System.setProperty("hadoop.home.dir", "ANY_DIRECTORY");

    // Example
    // winutils.exe is copied to C:\winutil\bin\
    // System.setProperty("hadoop.home.dir", "C:\\winutil\\");
            String logFile = "C:\\sample_log.log";
            SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("local");
            JavaSparkContext sc = new JavaSparkContext(conf);
            JavaRDD logData = sc.textFile(logFile).cache();

            long numAs = logData.filter(new Function<String, Boolean>() {
                public Boolean call(String s) {
                        return s.contains("a");
                }
            }).count();

            System.out.println("Lines with a: " + numAs);

    }

}

If winutils.exe is copied to C:\winutil\bin\

then setProperty as below

System.setProperty("hadoop.home.dir", "C:\\winutil\\");
0
votes

I too faced this issue when trying to launch spark-shell from my Windows laptop. I solved this and it worked for me, hope it would help. It's a very small mistake I made - I saved the winutils executable as "winutils.exe" instead of just winutils.

So when the variable gets resolved it's been resolving to winutils.exe.exe which is nowhere in the Hadoop binaries. I removed that ".exe" and triggered the shell, it worked. I suggest you to have a look at the name it is been saved.