why spark scala JDBC converting NUMBER(1) in oracle to boolean in spark dataframe

Question

Below is my program to connect to oracle using spark scala JDBC code:

    /usr/hdp/current/spark2-client/bin/spark-shell --driver-memory 5g --executor-memory 5g --jars /usr/hdp/current/sqoop-client/lib/ojdbc7.jar

import org.apache.spark.sql.SQLContext
    val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
    val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//Host:1521/QAM").option("driver", "oracle.jdbc.driver.OracleDriver").option("dbtable", "(select * from MGPH.APPLICATION where APPLICATION_ID in (11,12))").option("user", "XXXXXXXXXX").option("password", "xxxxxxxxx").option("fetchsize", "100").load()
    dataframe_mysql.show()

Spark Output :

scala> dataframe_mysql.show()
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
|c1|c2|                c3|               c4|c5|c6|c7|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
|            11|                  1|Safire.Accumulato...|Safire Accumulato...|       true|      3346|             false|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+

Oracle Table structure :

Name                Null?    Type          
------------------- -------- ------------- 
c1     NOT NULL NUMBER(3)     
c2 NOT NULL NUMBER(2)     
c3 NOT NULL VARCHAR2(50)  
c4 NOT NULL VARCHAR2(500) 
c5         NOT NULL NUMBER(1)     
c5                   NUMBER(10)    
c7 NUMBER(1)

Question :

Column c7 in oracle has NUMBER(1), but spark JDBC converting that into boolean type when import.

Please suggest , how to avoid true/false? and make output as 0/1 in dataframe.

Try casting it to string in the jdbc query, if that’s something you have the flexibility. — roh
I have 45 different table, so need some common approach in option property — NS Saravanan

Vinoth Chinnasamy Vinoth Chinnasamy · Accepted Answer · 2018-02-22T04:35:59

Please refer the article : http://www.ericlin.me/2017/05/oracle-number10-field-maps-to-boolean-in-spark/ and https://github.com/apache/spark/pull/14377 . Spark's BooleanType is mapped to oracle's NUMBER(1) (Since boolean type is not available in oracle). So you will have to handle this after reading your table into Spark[By casting or using the boolean value in your spark transformation].

why spark scala JDBC converting NUMBER(1) in oracle to boolean in spark dataframe

2 Answers