spark jdbc read tuning where table without primary key

Question

I am reading 30M records from oracle table with no primary key columns. spark jdbc reading hangs and not fetching any data. where i can get the result from Oracle SQLDeveloper within few seconds for same query.

oracleDf = hiveContext.read().format("jdbc").option("url", url)
                        .option("dbtable", queryToExecute)
                        .option("numPartitions ","5")
                        .option("fetchSize","1000000")
                        .option("user", use).option("password", pwd).option("driver", driver).load().repartition(5);

i cannot use partition columns as i do not have primary key column. can anyone advice to improve performance.

Thanks

So, currently the column would need to be numeric. If need not be part of a primary key though. — thebluephantom
If anyone has found the answer , please let us know. I am also stuck with the same situation. — Atif

Sai Sai · Accepted Answer · 2018-09-21T15:25:39

There are many a things that can be used to optimize your DF creation. You might want to drop repartition and also use predicates to parallelize data retrieval process for Spark actions.

If the filter is not based on primary key or an indexed column, exploring ROWID is a possibility.

spark jdbc read tuning where table without primary key

1 Answers