Spark throws error when trying to save a CSV file

Question

Community wizards,

I am really frustrated. When it comes to Spark, Hadoop et al., nothing seems to be straightforward.

For the past hours, I tried to find a solution to the following issue:

ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 823)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;

Versions:

OS: Windows 10
Spark version: 2.4.6
Scala version: 2.11.12
Hadoop version: 2.7.1
Java version: 1.8.0_202 (64-bit)

Variables:

SPARK_HOME: C:\Spark
HADOOP_HOME: C:\Hadoop\hadoop-2.7.1
SCALA_HOME: C:\Program Files (x86)\scala
JRE_HOME: C:\Program Files\Java\jre1.8.0_202
JAVA_HOME: C:\Program Files\Java\jdk1.8.0_202

Paths:

%SPARK_HOME%\bin
%HADOOP_HOME%\bin
%SCALA_HOME%\bin
%JRE_HOME%\bin
%JAVA_HOME%\bin

The command that throws the error is:

df.coalesce(1).write.format("csv").save("result")

The folder (result) seems to be created, but it's empty.

I have literally no idea how to solve this issue.

Any help would be warmly welcomed.

The following post might help you, stackoverflow.com/questions/50344874/… — Sivakumar

Feroz Feroz · Accepted Answer · 2020-07-13T17:29:48

I believe your HADOOP_HOME=C:\Hadoop\hadoop-2.7.1 is pointed to Hadoop Binaries/Libraries, instead you should need a tool called WINUTILS.EXE to work in Windows.

You can download Hadoop Version of winutils from git and map HADOOP_HOME to Root directory of Winutils. https://github.com/steveloughran/winutils

Source:

From Hadoop's Confluence: Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions

https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems

Spark throws error when trying to save a CSV file

2 Answers