0
votes

An assignment I have asked to find the total of a column, but it wants the answer as decimal, rounded to two digits. I got the total pretty easily. Took a bit to get it rounded, but now that I have that, the value is returned as a float object. I can't figure out how to convert this to a decimal. I've tried converting the float into a new dataframe, tried casting the float to doubletype, tried casting as part of the sum formula, I've even tried multiple ways of summing the column. I can't get this!

The DataFrame has a column called "tolls_amount". I tried two ways of summing it, neither of which seemed to return a decimal type (though both return the correct answer):

df2 = df.withColumn("tolls_amount", df["tolls_amount"].cast(DoubleType())) 
sum_number = df2[["tolls_amount"]].groupBy().sum().collect()[0][0]
df2 = df.withColumn("tolls_amount", df["tolls_amount"].cast(DoubleType())) 
sum_number = df2.agg({"tolls_amount":"sum"}).collect()[0]
just_num = sum_number["sum(tolls_amount)"].round(2)

Here's some of what I tried to get it as a decimal:

sum_number = df2[["tolls_amount"]].groupBy().sum().collect()[0][0]
sum_df = sum_number.withColumn("Total", sum_number.cast(DoubleType()))
sum_2 = sum_number.cast(DoubleType())
sum_number = df2[["tolls_amount"]].groupBy().sum().cast(DoubleType()).collect()[0][0]

I had some others, but have been removing them as they didn't work. Is there any way to accomplish this? The total of the column comes out to 2430066.7000020845 - rounded to 2430066.7 with round(sum_number, 2), but both have a type of float instead of decimal.

1

1 Answers

1
votes

You can round the number to 2 decimal places during the aggregation, and cast to decimal type, before collecting to Python:

import pyspark.sql.functions as F

sum_number = df[["tolls_amount"]].groupBy().agg(
    F.round(
        F.sum('tolls_amount'), 2
    ).cast('decimal(20,2)')
).collect()[0][0]