I am reading csv in spark using below statement.
df = spark.read.csv('<CSV FILE>', header=True, inferSchema = True)
When I am checking in spark dataframe some of the integer and double columns are stored as string column in dataframe. However, this is not the case with all the columns.
I have checked the values of particular column and all the values are of double type but still spark is inferring as StringType.
Since I am loading CSV file with around 1000 columns it is not feasible to specify the schema explicitly as well.
Any suggestions/help would be highly appreciated.
Regards,
Neeraj
df.withColumn("a", col("a").cast(DecimalType(10,2) )
or whatever. – philantrovert