0
votes

I have an offline pyspark cluster (no internet access) where I need to install graphframes library.

I have manually downloaded the jar from here added in $SPARK_HOME/jars/ and then when I try to use it I get the following error:

error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term typesafe in package com,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term scalalogging in value com.typesafe,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access type LazyLogging in value com.slf4j,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.

Which is the correct way to offline install it with all the dependencies?

1

1 Answers

1
votes

I manage to install the graphframes libarary. First of all I found the graphframes dependencies witch where:

scala-logging-api_xx-xx.jar
scala-logging-slf4j_xx-xx.jar

where xx is the proper versions for scala and the jar version. Then I installed them in the proper path. Because I work in an Cloudera machine the proper path is:

/opt/cloudera/parcels/SPARK2/lib/spark2/jars/

If you can not place them in this directory in your cluster (because you have no root rights and your admin is super lazy) you can simply add in your spark-submit/ spark-shell

spark-submit ..... --driver-class-path /path-for-jar/  \
                   --jars /../graphframes-0.5.0-spark2.1-s_2.11.jar,/../scala-logging-slf4j_2.10-2.1.2.jar,/../scala-logging-api_2.10-2.1.2.jar

This works for Scala. In order to use graphframes for python you need to download graphframes jar and then through shell

#Extract JAR content
 jar xf graphframes_graphframes-0.3.0-spark2.0-s_2.11.jar
#Enter the folder
 cd graphframes
#Zip the contents
 zip graphframes.zip -r *

And then add the zipped file in your python path in spark-env.sh or your bash_profile with

export PYTHONPATH=$PYTHONPATH:/..proper path/graphframes.zip:.

Then opening the shell/submitting (again with the same arguments as with scala) importing graphframes works normaly

This link was extremely useful for this solution