How to convert a spark DataFrame with a Decimal to a Dataset with a BigDecimal of the same precision?

Question

How can I create a spark Dataset with a BigDecimal at a given precision? See the following example in the spark shell. You will see I can create a DataFrame with my desired BigDecimal precision, but cannot then convert it to a Dataset.

scala> import scala.collection.JavaConverters._
scala> case class BD(dec: BigDecimal)
scala> val schema = StructType(Seq(StructField("dec", DecimalType(38, 0))))
scala> val highPrecisionDf = spark.createDataFrame(List(Seq(BigDecimal("12345678901122334455667788990011122233"))).map(a => Row.fromSeq(a)).asJava, schema)
highPrecisionDf: org.apache.spark.sql.DataFrame = [dec: decimal(38,0)]
scala> highPrecisionDf.as[BD]
org.apache.spark.sql.AnalysisException: Cannot up cast `dec` from decimal(38,0) to decimal(38,18) as it may truncate
The type path of the target object is:
- field (class: "scala.math.BigDecimal", name: "dec")
- root class: "BD"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;

Similarly I am unable to create a Dataset from a case class where I've used a higher precision BigDecimal.

scala> List(BD(BigDecimal("12345678901122334455667788990011122233"))).toDS.show()
+----+
| dec|
+----+
|null|
+----+

Is there any way to create a Dataset containing a BigDecimal field with precision different to the default decimal(38,18)?

Jayadeep Jayaraman Jayadeep Jayaraman · Accepted Answer · 2019-11-14T15:45:28

By default spark will infer the schema of the Decimal type (or BigDecimal) in a case class to be DecimalType(38, 18) (see org.apache.spark.sql.types.DecimalType.SYSTEM_DEFAULT)

The workaround is to convert the dataset to dataframe as below

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)

Workaround

import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)

You can check this link for finer details https://issues.apache.org/jira/browse/SPARK-18484)

How to convert a spark DataFrame with a Decimal to a Dataset with a BigDecimal of the same precision?

2 Answers

Workaround