0
votes

I've set up Spark 2.2.0 on my Windows machine using Scala 2.11.8 on IntelliJ IDE. I'm trying to make Spark connect to Netezza using JDBC drivers.

I've read through this link and added the com.ibm.spark.netezzajars to my project through Maven. I attempt to run the Scala script below just to test the connection:

package jdbc
object SimpleScalaSpark {
  def main(args: Array[String]) {
    import org.apache.spark.sql.{SparkSession, SQLContext}
    import com.ibm.spark.netezza

    val spark = SparkSession.builder
      .master("local")
      .appName("SimpleScalaSpark")
      .getOrCreate()

    val sqlContext = SparkSession.builder()
      .appName("SimpleScalaSpark")
      .master("local")
      .getOrCreate()

val nzoptions = Map("url" -> "jdbc:netezza://SERVER:5480/DATABASE",
  "user" -> "USER",
  "password" -> "PASSWORD",
  "dbtable" -> "ADMIN.TABLENAME")

val df = sqlContext.read.format("com.ibm.spark.netezza").options(nzoptions).load()
  }
}

However I get the following error:

17/07/27 16:28:17 ERROR NetezzaJdbcUtils$: Couldn't find class org.netezza.Driver
java.lang.ClassNotFoundException: org.netezza.Driver
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
    at com.ibm.spark.netezza.NetezzaJdbcUtils$$anonfun$getConnector$1.apply(NetezzaJdbcUtils.scala:49)
    at com.ibm.spark.netezza.NetezzaJdbcUtils$$anonfun$getConnector$1.apply(NetezzaJdbcUtils.scala:46)
    at com.ibm.spark.netezza.DefaultSource.createRelation(DefaultSource.scala:50)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at jdbc.SimpleScalaSpark$.main(SimpleScalaSpark.scala:20)
    at jdbc.SimpleScalaSpark.main(SimpleScalaSpark.scala)
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:netezza://SERVER:5480/DATABASE
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at com.ibm.spark.netezza.NetezzaJdbcUtils$$anonfun$getConnector$1.apply(NetezzaJdbcUtils.scala:54)
    at com.ibm.spark.netezza.NetezzaJdbcUtils$$anonfun$getConnector$1.apply(NetezzaJdbcUtils.scala:46)
    at com.ibm.spark.netezza.DefaultSource.createRelation(DefaultSource.scala:50)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at jdbc.SimpleScalaSpark$.main(SimpleScalaSpark.scala:20)
    at jdbc.SimpleScalaSpark.main(SimpleScalaSpark.scala)

I have two ideas:

1) I don't don't believe I actually installed any Netezza JDBC driver, though I thought the jars I brought into my project from the link above was sufficient. Am I just missing a driver or am I missing something in my Scala script?

2) In the same link, the author makes mention of starting the Netezza Spark package:

For example, to use the Spark Netezza package with Spark’s interactive shell, start it as shown below:

$SPARK_HOME/bin/spark-shell –packages com.ibm.SparkTC:spark-netezza_2.10:0.1.1 –driver-class-path~/nzjdbc.jar

I don't believe I'm invoking any package apart from jdbc in my script. Do I have to add that to my script?

Thanks!

1
What is your spark-submit command? Note that it needs both --jars and --driver-class-path to load your JDBC Driver.philantrovert
@philantrovert I believe I setup IntelliJ to start Spark when I run my Scala script, but not certain what the spark-submit command is defaulted to start with. Regardless, you're saying my spark-submit command should be updated to include the --packages and --driver-class-path like (2) above?gtnbz2nyt
I also did try running this in the command line before executing the script, but unfortunately still got the same error: spark-shell --packages com.ibm.SparkTC:spark-netezza_2.10:0.1.1 --driver-class-path /path/to/nzjdbc.jargtnbz2nyt

1 Answers

2
votes

Your 1st idea is right, I think. You almost certainly need to install the Netezza JDBC driver if you have not done this already.

From the link you posted:

This package can be deployed as part of an application program or from Spark tools such as spark-shell, spark-sql. To use the package in the application, you have to specify it in your application’s build dependency. When using from Spark tools, add the package using –packages command line option. Netezza JDBC driver also should be added to the application dependencies.

The Netezza driver is something you have to download yourself, and you need support entitlement to get access to it (via IBM's Fix Central or Passport Advantage). It is included in either the Windows driver/client support package, or the linux driver package.