I am getting the following error when trying to perform a cast on a column (read from a comma separated csv file with headers).
Here is the code I am using:
var df = spark.read.option("header","true").option("delimiter",",").csv("/user/sample/data")
df.withColumn("columnCast", expr("CAST(SaleAmount) AS LONG")).count
This causes the following exception to be thrown every time. I've tried different columns when casting and some throw while others do not. I've also tried the following which also throws the same exception.
df.withColumn("columnCast", expr("CAST(NULL) AS LONG")).count
java.lang.UnsupportedOperationException: empty.init at scala.collection.TraversableLike$class.init(TraversableLike.scala:451) at scala.collection.mutable.ArrayOps$ofInt.scala$collection$IndexedSeqOptimized$$super$init(ArrayOps.scala:234) at scala.collection.IndexedSeqOptimized$class.init(IndexedSeqOptimized.scala:135) at scala.collection.mutable.ArrayOps$ofInt.init(ArrayOps.scala:234) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$7$$anonfun$11.apply(FunctionRegistry.scala:565) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$7$$anonfun$11.apply(FunctionRegistry.scala:558) at scala.Option.getOrElse(Option.scala:121)
I have tried running this both in spark-shell and zeppelin. Spark version is 2.4.0.cloudera2 managed by Cloudera.
What is causing this behaviour? Is this intended? How do I handle this?
spark-shell --version
and include the output to your question? Also, can you runspark-shell
and executespark.catalog.listFunctions.count
? What's the output? I think there's something wrong with the Spark environment and any query would simply fail. – Jacek Laskowski