0
votes

I have loaded one csv file into my spark dataframe, after that if I try to calculate using approxQuantile method which is giving me an error. Tried with different data set and different columns, probablibities, and relativeError. Help me out understanding what's going on.

df.approxQuantile("column_name", [0.2,0.3,0.6,1.0], 0)

I am getting the following error :

py4j.protocol.Py4JError: An error occurred while calling o30.approxQuantile. Trace: py4j.Py4JException: Method approxQuantile([class scala.collection.immutable.$colon$colon, class scala.collection.immutable.$colon$colon, class java.lang.Double]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:272) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

1
What's your data type (df.printSchema())?MaFF
All columns are of type "integer" root |-- j: integer (nullable = true) |-- b: integer (nullable = true) |-- f: integer (nullable = true) |-- l: integer (nullable = true) |-- e: integer (nullable = true) |-- c: integer (nullable = true) |-- g: integer (nullable = true) |-- h: integer (nullable = true) |-- m: integer (nullable = true) |-- a: integer (nullable = true) |-- k: integer (nullable = true) |-- d: integer (nullable = true) |-- i: integer (nullable = true)Sunil Rao

1 Answers

1
votes

This can happen if your pyspark driver is using Spark 2.2.0 and your Spark cluster is running 2.1.1 (or earlier). Ensure that your driver & cluster versions match and you should be good to go!

See the note in the docs about a change to the interface for approxQuantile in 2.2:

Changed in version 2.2: Added support for multiple columns.