Below is my program to connect to oracle using spark scala JDBC code:
/usr/hdp/current/spark2-client/bin/spark-shell --driver-memory 5g --executor-memory 5g --jars /usr/hdp/current/sqoop-client/lib/ojdbc7.jar
import org.apache.spark.sql.SQLContext
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val dataframe_mysql = sqlcontext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//Host:1521/QAM").option("driver", "oracle.jdbc.driver.OracleDriver").option("dbtable", "(select * from MGPH.APPLICATION where APPLICATION_ID in (11,12))").option("user", "XXXXXXXXXX").option("password", "xxxxxxxxx").option("fetchsize", "100").load()
dataframe_mysql.show()
Spark Output :
scala> dataframe_mysql.show()
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
|c1|c2| c3| c4|c5|c6|c7|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
| 11| 1|Safire.Accumulato...|Safire Accumulato...| true| 3346| false|
+--------------+-------------------+--------------------+--------------------+-----------+----------+------------------+
Oracle Table structure :
Name Null? Type
------------------- -------- -------------
c1 NOT NULL NUMBER(3)
c2 NOT NULL NUMBER(2)
c3 NOT NULL VARCHAR2(50)
c4 NOT NULL VARCHAR2(500)
c5 NOT NULL NUMBER(1)
c5 NUMBER(10)
c7 NUMBER(1)
Question :
Column c7 in oracle has NUMBER(1), but spark JDBC converting that into boolean type when import.
Please suggest , how to avoid true/false? and make output as 0/1 in dataframe.