I am reading 30M records from oracle table with no primary key columns. spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query.
oracleDf = hiveContext.read().format("jdbc").option("url", url)
.option("dbtable", queryToExecute)
.option("numPartitions ","5")
.option("fetchSize","1000000")
.option("user", use).option("password", pwd).option("driver", driver).load().repartition(5);
i cannot use partition columns as i do not have primary key column. can anyone advice to improve performance.
Thanks