this is real frustrating. I have spent several days looking at all the issues found here in stackoverflow and on the web, doing all the instructions step by step, but I can't figure it out. I gave it up… This my error output:
Spark package found in SPARK_HOME: C:/spark/spark_3_0_1_bin_hadoop3_2 Launching java with spark-submit command
C:/spark/spark_3_0_1_bin_hadoop3_2/bin/spark-submit2.cmd
--driver-memory "2g" sparkr-shell C:\Users\user\AppData\Local\Temp\RtmpgT8rjY\backend_port11e45fad26cf
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap: JVM is not ready after 10 seconds
(... why launching “spark-submit2.cmd” and not “spark-submit”?)
After running this code:
> Sys.setenv(SPARK_HOME = "C:/spark/spark_3_0_1_bin_hadoop3_2"
> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
> sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"))
What I have done so far:
- checked for last JRE version (JRE_8u271), for folder’s permissions and environment path: all ok
- rtools40-x86_64 installed and set path, then in RStudio found: C:\rtools40\usr\bin\make.exe"
- downloaded last pre-built version spark-3.0.1-bin-hadoop3.2.tgz and decompressed with owner permission in c:\spark (no spaces in folder names!) ) and for security I have replaced all punctuation in folder name with underscores _ as you can see in my script above. Then set environment path
- checked that all permissions for all users were set for C:\spark\spark_3_0_1_bin_hadoop3_2: ok
- Manually unzipped sparkr.zip (contained in C:\spark\spark_3_0_1_bin_hadoop3_2\R\lib) into my R library C:\Program Files\R\R-4.0.3\library
- downloaded winutils for hadoop v 3.0.0 and unpacked in C:\winutils\bin and set path
Successful launching of sparkR via Windows Prompt. I’ve also launched spark-submit and it was all ok.
My environment paths:
- JAVA_HOME: C:\Java
- R_HOME: C:\Program Files\R\R-4.0.3\bin\x64
- RTools: C:\rtools40
- SPARK_HOME: C:\spark\spark_3_0_1_bin_hadoop3_2
- HADOOP_HOME: C:\winutils
- Path: C:\Program Files\R\R-4.0.3\bin\x64;C:\rtools40;C:\rtools40\mingw64\bin;C:\Java; [...]
I use Sparklyr too and it works very well, connecting in RStudio without any problems! But not SparkR...
What can I do more to initialize SparkR in RStudio and work with its functions?
RStudio Version 1.3.1093
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SparkR_3.0.1
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3
Thanks Gabriel