My problem is about connecting from Apache Spark to MongoDB using the official connector.
Stack versions are as follows:
- Apache Spark 2.2.0 (HDP build:
2.2.0.2.6.3.0-235
) - MongoDB 3.4.10 (2x-node replica set with authentification)
I use these jars:
- mongo-spark-connector-assembly-2.2.0.jar which i tried both to download from Maven repo and to build by myself with a proper Mongo Driver version
- mongo-java-driver.jar downloaded from Maven Repo
The issue is about version correspondence, as mentioned here and here.
They all say, that the method was renamed since Spark 2.2.0 so i need to use the connector for 2.2.0 version - and yes it is, here is the method in spark connector 2.1.1, and here is renamed one in 2.2.0
But i am sure, that i use the proper one. I did these steps:
git clone https://github.com/mongodb/mongo-spark.git
cd mongo-spark
git checkout tags/2.2.0
sbt check
sbt assembly
scp target/scala-2.11/mongo-spark-connector_2.11-2.2.0.jar user@remote-spark-server:/opt/jars
All tests was OK. After that i am using pyspark and Zeppelin (so deploy-mode is client) to read some data from MongoDB:
df = sqlc.read.format("com.mongodb.spark.sql.DefaultSource") \
.option('spark.mongodb.input.uri', 'mongodb://user:[email protected]:27017,172.22.100.234:27017/dmp?authMechanism=SCRAM-SHA-1&authSource=admin&replicaSet=rs0&connectTimeoutMS=300&readPreference=nearest') \
.option('collection', 'verification') \
.option('sampleSize', '10') \
.load()
df.show()
And got this error:
Py4JJavaError: An error occurred while calling o86.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 3, worker01, executor 1): java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonTypeOfTwo()Lscala/Function2;
I think am sure this method is not in the jar: TypeCoercion$.findTightestCommonTypeOfTwo()
I go to the SparkUI and look at the environment:
spark.repl.local.jars file:file:/opt/jars/mongo-spark-connector-assembly-2.2.0.jar,file:/opt/jars/mongo-java-driver-3.6.1.jar
And no different MongoDB-related files anywhere. Please help, what am i doing wrong? Thanks in advance