0
votes

I'm newer to spark , I tried to create a graphframe and do some query on that this is my code

import pyspark
from pyspark.sql import SQLContext
from graphframe import *
sc = pyspark.SparkContext()
sqlContext = SQLContext(sc)
vertices = sqlContext.createDataFrame([
("1","Alex", 28, "M","MIPT"),
("2","Emeli", 28, "F","MIPT"),
("7","Ilya", 29, "M","MSU")], ["id","name","age","gender","university"])
edges = sqlContext.createDataFrame([
("1","2","friend")
], ["src", "dst" , "type"])
g=GraphFrame(vertices,edges)
result = g.connectedComponents()

but it shows as the result the following error:

Traceback (most recent call last): File "", line 1, in File "C:\Users\ALI_PC\AppData\Local\Temp\spark-73d7bc01-3873-4423-ac2b-527e39608ece\userFiles-b2dd0ea9-9556-4bea-9931-915608bad9b0\graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar\graphframes\graphframe.py", line 279, in connectedComponents File "C:\Spark\spark-2.2.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in call File "C:\Spark\spark-2.2.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "C:\Spark\spark-2.2.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o249.run. : java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). at org.graphframes.lib.ConnectedComponents$$anonfun$2.apply(ConnectedComponents.scala:280) at org.graphframes.lib.ConnectedComponents$$anonfun$2.apply(ConnectedComponents.scala:280) at scala.Option.getOrElse(Option.scala:121) at org.graphframes.lib.ConnectedComponents$.org$graphframes$lib$ConnectedComponents$$run(ConnectedComponents.scala:279) at org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:139) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

how can I fix this problem,thank you!

1

1 Answers

0
votes

Exactly as stated in the exception message:

Checkpoint directory is not set. Please set it first using sc.setCheckpointDir().

you have to set checkpoint directory:

sc.setCheckpointDir(path_to_checkpoint_directory)