0
votes

Community wizards,

I am really frustrated. When it comes to Spark, Hadoop et al., nothing seems to be straightforward.

For the past hours, I tried to find a solution to the following issue:

ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 823)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;

Versions:

  1. OS: Windows 10
  2. Spark version: 2.4.6
  3. Scala version: 2.11.12
  4. Hadoop version: 2.7.1
  5. Java version: 1.8.0_202 (64-bit)

Variables:

  1. SPARK_HOME: C:\Spark
  2. HADOOP_HOME: C:\Hadoop\hadoop-2.7.1
  3. SCALA_HOME: C:\Program Files (x86)\scala
  4. JRE_HOME: C:\Program Files\Java\jre1.8.0_202
  5. JAVA_HOME: C:\Program Files\Java\jdk1.8.0_202

Paths:

  1. %SPARK_HOME%\bin
  2. %HADOOP_HOME%\bin
  3. %SCALA_HOME%\bin
  4. %JRE_HOME%\bin
  5. %JAVA_HOME%\bin

The command that throws the error is:

df.coalesce(1).write.format("csv").save("result")

The folder (result) seems to be created, but it's empty.

I have literally no idea how to solve this issue.

Any help would be warmly welcomed.

2
The following post might help you, stackoverflow.com/questions/50344874/…Sivakumar

2 Answers

2
votes

I believe your HADOOP_HOME=C:\Hadoop\hadoop-2.7.1 is pointed to Hadoop Binaries/Libraries, instead you should need a tool called WINUTILS.EXE to work in Windows.

You can download Hadoop Version of winutils from git and map HADOOP_HOME to Root directory of Winutils. https://github.com/steveloughran/winutils

Source:

From Hadoop's Confluence: Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions

https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems

0
votes

It seems that you don't have Hadoop binaries for Windows installed in HADOOP_HOME directory. Or it could be that their dependencies (such as Visual C++ Runtime) are missing.

You also might need to load shared libraries directly, it depends on the way you start your Spark application.

System.load(System.getenv("HADOOP_HOME") + "/lib/hadoop.ddl");