2
votes

Using JDBC datasource in Spark sql we try to run below query

select nvl( columnName , 1.0) from tablename

gives error as

cannot resolve 'nvl(tablename.`columnname`, 1.0BD)' due to data type mismatch: input to function coalesce should all be the same type, but it's [decimal(38,10), decimal(2,1)]

I know we can solve this with

select nvl( columnname , CAST( 1.0 as decimal(38,10))) from tablename

looks like i need to find the datatype of each and every column and cast to it.

  1. Is there any other way to handle it?
  2. Can i give schema definition upfront while loading the dataframe like csv format. [https://issues.apache.org/jira/browse/SPARK-16848]
  3. How to convert loaded Dataframe data types for each column.
1

1 Answers

2
votes
  1. You can use Coalesce instead on NVL. The inputs for coalesce are cast to the 'best' common data type.
  2. The JDBC connection uses the database schema as its schema, so it is not possible to give a schema upfront.
  3. You cast all columns to a different datatype by adding another select, this is easy in the dataframe/dataset API:

    // Create some toy data.
    val df = spark.range(100).select($"id", (rand(2) * 10).as("a"))
    df.printSchema
    
    // Define the casts.
    val casts = Seq(col("id").cast("int"), col("a").cast("int"))
    
    // Apply the casts.
    df.select(casts: _*).printSchema