I am using the JDBC driver of Simba Technologies Inc to get connected with the Google cloud spanner. It is running as expected with Java.sql. when, I tried to use the simba JDBC driver with the Spark's JDBC reader in order to read query output as DataFrame but, it is giving wrong output.
Here is my spanner table:
UserID UserName
1 Vaijnath
2 Ganesh
3 Rahul
MetaData:
UserID(String)
UserName(String)
I am executing Query as: SELECT * FROM users
This query fetch correct data when I use Simba JDBC driver with Java Sql, but it fails to fetch data When I use it with Spark SQL's JDBC reader.
It returns the DataFrame as
+------+--------+
|UserID|UserName|
+------+--------+
|UserID|UserName|
|UserID|UserName|
|UserID|UserName|
+------+--------+
As we can see, it is returning correct metadata and number of rows but, row contains the column names.
Here is the code I am using:
import java.util.Properties
import org.apache.spark.sql.{DataFrame, SparkSession}
object spannerIn {
val sparkSession =SparkSession
.builder()
.appName("Spark SQL basic example").master("local")
.config("spark.sql.warehouse.dir", "file:///tmp")
.config("spark.sql.shuffle.partitions", 1)
.getOrCreate()
val properties =new Properties()
properties.setProperty("user", "")
properties.setProperty("password", "")
properties.setProperty("driver", "com.simba.cloudspanner.core.jdbc42.CloudSpanner42Driver")
val connectionURL="jdbc:cloudspanner://localhost;Project=abc;Instance=pqr;Database=xyz;PvtKeyPath=FilePath"
val selectQuery="(select * from users)"
def main(args: Array[String]): Unit = {
val df = createJdbcDataframe()
df.show()
}
def createJdbcDataframe(): DataFrame = {
sparkSession.read.jdbc(connectionURL, selectQuery, properties)
}
}
My question is, can I Use the Simba JDBC Driver with Spark? If Yes, then what extra things I need to add. Any help Appreciated.