0
votes

I have setup an SPARK cluster on HDInsight and was am trying to use GraphFrames using this tutorial.

I have already used the custom scripts during the cluster creation to enable the GraphX on the spark cluster as described here.

When I am running the notepad,

import org.apache.spark.sql._
import org.apache.spark.sql.functions._

import org.graphframes._

i get the following error

<console>:45: error: object graphframes is not a member of package org
       import org.graphframes._
                  ^

I tried to install the graphframes from the spark terminal via Jupyter using the following command:

$SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.5

but Still I am unable to get it working. I am new to Spark and HDInsight so can someone please point out what else I need to install on this cluster to get this working.

2
It looks like your GraphX link is broken...Andrew Moll
how can i verify this?Kiran
did you try it on an non-HDInsight cluster ?eliasah

2 Answers

0
votes

Today, this works in spark-shell, but doesn't work in jupyter notebook. So when you run this: $SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.5 It works (at least on spark 1.6 cluster version) in the context of this spark-shell session. But in jupyter there is currently no way to load packages. This feature is going to be added soon to jupyter notebooks in the clusters. In the meantime you can use spark-shell, or spark-submit, etc.

0
votes

Once you upload or import graphframes libraries from Maven repository, you need to restart your cluster so as to attach the library.

So it works for me.