I have CSV data:
"id","price"
"1","79.07"
"2","91.27"
"3","85.6"
Reading it using SparkSession
:
def readToDs(resource: String, schema: StructType): Dataset = {
sparkSession.read
.option("header", "true")
.schema(schema)
.csv(resource)
.as[ItemPrice]
}
Case class:
case class ItemPrice(id: Long, price: BigDecimal)
Printing Dataset:
def main(args: Array[String]): Unit = {
val prices: Dataset =
readToDs("src/main/resources/app/data.csv", Encoders.product[ItemPrice].schema);
prices.show();
}
Output:
+----------+--------------------+
| id| price|
+----------+--------------------+
| 1|79.07000000000000...|
| 2|91.27000000000000...|
| 3|85.60000000000000...|
+----------+--------------------+
Desired output:
+----------+--------+
| id| price|
+----------+--------+
| 1| 79.07|
| 2| 91.27|
| 3| 85.6 |
+----------+--------+
The option I already know:
Define schema manually with hardcoded column order and datatypes like:
def defineSchema(): StructType =
StructType(
Seq(StructField("id", LongType, nullable = false)) :+
StructField("price", DecimalType(3, 2), nullable = false)
)
And use it like:
val prices: Dataset = readToDs("src/main/resources/app/data.csv", defineSchema);
How can I set precision (3,2)
without manually defining all structure?