I'm running Spark in a Jupyter notebook (using the jupyter-scala kernel). I have a dataframe with columns of type String, and I want a new dataframe with these values as type Int. I've tried all the answers from this post:
How to change column types in Spark SQL's DataFrame?.
But I keep getting an error:
org.apache.spark.SparkException: Job aborted due to stage failure
In particular, I am getting this error message:
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 44: Decimal
So I went and looked at line 43:
/* 043 */ Decimal tmpDecimal6 = Decimal.apply(new java.math.BigDecimal(primitive5.toString()));
So far nothing that I've tried has worked.
Here is a simple example:
val dF = sqlContext.load("com.databricks.spark.csv", Map("path" -> "../P80001571-ALL.csv", "header" -> "true"))
val dF2 = castColumnTo( dF, "contbr_zip", IntegerType )
dF2.show
val dF = sqlContext.load("com.databricks.spark.csv", Map("path" ->
where castColumnTo is defined as suggested by Martin Senne in the post mentioned above:
object DFHelper
def castColumnTo( df: DataFrame, cn: String, tpe: DataType ) : DataFrame = {
df.withColumn( cn, df(cn).cast(tpe) )
}
}
This is the error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost): java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 97, Column 45: Decimal
Line 97 looks like this:
Decimal tmpDecimal18 = Decimal.apply(new java.math.BigDecimal(primitive17.toString()));