Netezza pySpark connection fails while DbVisualizer works

Question

Update: This has been resolved. It was a typo in the URL.

--
I'm trying to read data from Netezza using pyspark on Windows 10 1909.

I can read from it using DbVisualizer no problem. Then I tried running pyspark --driver-class-path <path to nzjdbc.jar> --jars <path to nzjdbc.jar> --master local[*] (same machine, VPN connection, JDBC driver jar, and all).

I used this code from the pyspark shell:

dataframe = spark.read.format("jdbc").options(
    url="jdbc:netezza://<server>:5480/<database>",
    dbtable="ADMIN.<table>",
    user="***",
    password="***",
    driver="org.netezza.Driver",
).load()

but this fails for me, with the following stack, after about 10-20 seconds (I also tried adding queryTimeout="300", but that didn't make a difference):

"...\AppData\Local\Continuum\miniconda3\envs\spark\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o41.load.
: org.netezza.error.NzSQLException: Connection timed out: connect
        at org.netezza.sql.NzConnection.initSocket(NzConnection.java:2859)
        at org.netezza.sql.NzConnection.open(NzConnection.java:293)
        at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:675)
        at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:662)
        at org.netezza.Driver.connect(Driver.java:155)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)

A coworker is able to run the same code from his Mac with no issues (also on VPN).

Is there something in Windows or in Netezza itself that could affect what clients are able connect to Netezza? Or could I be missing something in the pyspark command?

Try disabling the firewall on you computer (temporarily) and se if that helps? — Lars G Olsen
Thanks @Lars G Olsen. It was me the whole time, I made a typo in the URL — Arseny
Lol, I’m the cause for ‘error 40’ all the time (40 cm is reputedly the distance the user has from the screen) — Lars G Olsen
@LarsGOlsen Yeah, you'd think one'd learn after having made this kind of mistake a bunch of times in the past. In my defense, I think it was a typo that wasn't obvious right away b/c the word I misspelled seems to have other spellings, and typos in strings are notoriously hard to catch as it is — Arseny

user14338238 user14338238 · Accepted Answer · 2020-10-16T10:18:31

Can you try increasing the LoginTimeout value ? FYI queryTimeout refers to timeout for a single query.

Netezza pySpark connection fails while DbVisualizer works

2 Answers