I'm trying to connect to Oracle DB from Spark SQL with following code:
val dataTarget=sqlcontext.read.
format("jdbc").
option("driver", config.getString("oracledriver")).
option("url", config.getString("jdbcUrl")).
option("user", config.getString("usernameDH")).
option("password", config.getString("passwordDH")).
option("dbtable", targetQuery).
option("partitionColumn", "ID").
option("lowerBound", "5").
option("upperBound", "499999").
option("numPartitions", "10").
load().persist(StorageLevel.DISK_ONLY)
By default when we connect with Oracle through Spark SQL it'll create one connection for one partition will be created for the entire RDD. This way I loose parallelism and performance issues comes when there is huge data in a Table. In my code I have passed option("numPartitions", "10")
which will create 10 connection. Please correct if I'm wrong as I know, the number of connections with Oracle will be equal to the number of partitions we pass.
I'm getting below error if I use more connection because may be there is a connection limit to Oracle.
java.sql.SQLException: ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit
To create more partitions for parallelism if I use more partitions, error comes but if I put less I face performance issues. Is there any other way to create a single connection and load data into multiple partitions (this will save my life).
Please suggest.