2
votes

Below is my program to connect to oracle using spark scala JDBC code:

    /usr/hdp/current/spark2-client/bin/spark-shell --driver-memory 5g --executor-memory 5g --jars /usr/hdp/current/sqoop-client/lib/ojdbc7.jar

import org.apache.spark.sql.SQLContext
    val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
    val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//Host:1521/QAM").option("driver", "oracle.jdbc.driver.OracleDriver").option("dbtable", "(select * from MGPH.APPLICATION where APPLICATION_ID in (11,12))").option("user", "XXXXXXXXXX").option("password", "xxxxxxxxx").option("fetchsize", "100").load()
    dataframe_mysql.show()

Spark Output :

scala> dataframe_mysql.show()
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
|c1|c2|                c3|               c4|c5|c6|c7|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
|            11|                  1|Safire.Accumulato...|Safire Accumulato...|       true|      3346|             false|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+

Oracle Table structure :

Name                Null?    Type          
------------------- -------- ------------- 
c1     NOT NULL NUMBER(3)     
c2 NOT NULL NUMBER(2)     
c3 NOT NULL VARCHAR2(50)  
c4 NOT NULL VARCHAR2(500) 
c5         NOT NULL NUMBER(1)     
c5                   NUMBER(10)    
c7 NUMBER(1)     

Question :

Column c7 in oracle has NUMBER(1), but spark JDBC converting that into boolean type when import.

Please suggest , how to avoid true/false? and make output as 0/1 in dataframe.

2
Try casting it to string in the jdbc query, if that’s something you have the flexibility.roh
I have 45 different table, so need some common approach in option propertyNS Saravanan

2 Answers

1
votes

Please refer the article : http://www.ericlin.me/2017/05/oracle-number10-field-maps-to-boolean-in-spark/ and https://github.com/apache/spark/pull/14377 . Spark's BooleanType is mapped to oracle's NUMBER(1) (Since boolean type is not available in oracle). So you will have to handle this after reading your table into Spark[By casting or using the boolean value in your spark transformation].

0
votes

I solved this issue using to_char method in the select class of JDBC for any Boolean column .

Below is the code used ( to_char(HEARTBEATS_ENABLED) ). I tried to_number as well, but it produce result like 1.0000, so i used to_char to achieve desire result

val result=sqlcontext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//Host:1521/QAM").option("driver", "oracle.jdbc.driver.OracleDriver")
.option("dbtable", "(select to_char(HEARTBEATS_ENABLED) as HEARTBEATS_ENABLED[enter link description here][1],APPLICATION_ID,APPLICATION_TYPE_ID,NAME,DESCR,***to_char(ACTIVE_STAT) as ACTIVE_STAT*** ,PROGRAM_ID from  MGPH.APPLICATION where APPLICATION_ID in (11,12))").option("user", "myuser").option("password", "my password").option("fetchsize", "100").load()
result.show()
result.printSchema