Spark Scala: Cannot up cast from string to int as it may truncate

Question

I got this exception while playing with spark.

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast price from string to int as it may truncate The type path of the target object is: - field (class: "scala.Int", name: "price") - root class: "org.spark.code.executable.Main.Record" You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;

How Can this exception be solved? Here is the code

object Main {

 case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
                accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
 def main(args: Array[String]) {

   System.setProperty("hadoop.home.dir", "C:\\winutils\\");

   val schema = Encoders.product[Record].schema

   val df = SparkConfig.sparkSession.read
  .option("header", "true")
  .csv("SalesJan2009.csv");

   import SparkConfig.sparkSession.implicits._
   val ds = df.as[Record]

  //ds.groupByKey(body => body.state).count().show()

  import org.apache.spark.sql.expressions.scalalang.typed.{
  count => typedCount,
  sum => typedSum
}

  ds.groupByKey(body => body.state)
  .agg(typedSum[Record](_.price).name("sum(price)"))
  .withColumnRenamed("value", "group")
  .alias("Summary by state")
  .show()
}

can you try moving the case class Record outside the main, and can you post the sample data ? — koiralo
@Shankar Koirala here is the link of data that I am using github.com/JuliaData/CSV.jl/blob/master/test/test_files/… — shams
@Shankar Koirala sorry now I check that this issues come converting DataFrame to DataSet[Record]. Please help in that — shams

koiralo koiralo · Accepted Answer · 2017-07-31T12:19:33

You read the csv file first and tried to convert to it to dataset which has different schema. Its better to pass the schema created while reading the csv file as below

val spark = SparkSession.builder()
  .master("local")
  .appName("test")
  .getOrCreate()

import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema

val ds = spark.read
  .option("header", "true")
  .schema(schema)  // passing schema 
  .option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
  .csv(path)// csv path
  .as[Record] // convert to DS

The default timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXX so you also need to pass your custom timestampFormat.

Hope this helps

Spark Scala: Cannot up cast from string to int as it may truncate

2 Answers