7
votes

I'm doing some testing of spark decimal types for currency measures and am seeing some odd precision results when I set the scale and precision as shown below. I want to be sure that I won't have any data loss during calculations but the example below is not reassuring of that. Can anyone tell me why this is happening with spark sql? Currently on version 2.3.0

val sql = """select cast(cast(3 as decimal(38,14)) / cast(9 as decimal(38,14)) as decimal(38,14)) val"""
spark.sql(sql).show

This returns

+----------------+
|             val|
+----------------+
|0.33333300000000|
+----------------+
2
Your SQL works fine on Spark 2.2.1, producing 0.3333... with 14 3s.Leo C
Looks like someone created a JIRA ticket for this with this post referenced. Thanks.Jared

2 Answers

6
votes

This is a current open issue, see SPARK-27089. The suggested work around is to adjust the setting below. I validated that the SQL statement works as expected with this setting set to false.

spark.sql.decimalOperations.allowPrecisionLoss=false
2
votes

Use BigDecimal to avoid precision loss. See Double vs. BigDecimal?

example:

scala> val df = Seq(BigDecimal("0.03"),BigDecimal("8.20"),BigDecimal("0.02")).toDS
df: org.apache.spark.sql.Dataset[scala.math.BigDecimal] = [value: decimal(38,18)]

scala> df.select($"value").show
+--------------------+
|               value|
+--------------------+
|0.030000000000000000|
|8.200000000000000000|
|0.020000000000000000|
+--------------------+

Using BigDecimal:

scala> df.select($"value" + BigDecimal("0.1")).show
+-------------------+
|      (value + 0.1)|
+-------------------+
|0.13000000000000000|
|8.30000000000000000|
|0.12000000000000000|
+-------------------+

if you don't use BigDecimal, there will be a loss in precision. In this case 0.1 is a double

scala> df.select($"value" +  lit(0.1)).show
+-------------------+
|      (value + 0.1)|
+-------------------+
|               0.13|
|  8.299999999999999|
|0.12000000000000001|
+-------------------+