I'm using Spring Framework to create an api to query some of my tables in Hadoop. The command I use :
println("-----------------------------------------------------------------before )
val spark = SparkSession
.builder()
.appName("API")
.master("local[*])
.enableHiveSupport()
.getOrCreate()
println("--------------------------------------------------------------------Session was created")
I'm using Spark 2.11.6 and Scala v2.2.0. When I use the spark-shell I connect to the remote cluster.
In the log I don't get any errors but I see that a local hive repository is created :
[ main] o.a.h.hive.metastore.MetaStoreDirectSql : Using direct SQL, underlying DB is DERBY
main] o.a.hadoop.hive.ql.session.SessionState : Created local directory: C:/Users/..../.../Local/Temp/..._resources
2018-05-10 16:32:32.556 INFO 16148 --- [ main] o.a.hadoop.hive.ql.session.SessionState : Created HDFS directory: /tmp/hive/myuser/....
I'm trying to connect to a remote Cloudera cluster. I copied the xml files (hive-site,hdfs-site,core-stire,yarn-site) to the conf directory in my project, to $SPARK_CONF dir. I added the SPARK_HOME path to the PATH variable and I assigned the HADDOP_HOME variable to point to the winutils location.
What else can I do?
The log is pretty long, a few messages that I see and might imply anything to you :
-----------------------------------------------------------------ENV=local[*]
2018-05-10 16:32:16.930 WARN 16148 --- [ main] org.apache.hadoop.util.NativeCodeLoader : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[ main] org.apache.spark.util.Utils : Successfully started service 'SparkUI' on port 4040.
main] o.s.jetty.server.handler.ContextHandler : Started o.s.j.s.ServletContextHandler@13ee97af{/stages/pool/json,null,AVAILABLE,@Spark}
[ main] org.apache.spark.ui.SparkUI : Bound SparkUI to 0.0.0.0, and started at http://192.168.56.1:4040
[ main] o.apache.spark.sql.internal.SharedState : URL.setURLStreamHandlerFactory failed to set FsUrlStreamHandlerFactory
[ main] DataNucleus.Persistence : Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
[ main] DataNucleus.Datastore : The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
[ main] DataNucleus.Query : Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
[ main] o.a.h.hive.metastore.MetaStoreDirectSql : Using direct SQL, underlying DB is DERBY
[ main] o.a.hadoop.hive.metastore.ObjectStore : Failed to
get database global_temp, returning NoSuchObjectException
[ main] o.a.hadoop.hive.ql.session.SessionState : Created local directory: C:/Users/myuser/AppData/Local/Temp/1fa7a82b-fe17-4795-8973-212010634cd1_resources
[ main] o.a.hadoop.hive.ql.session.SessionState : Created HDFS directory: /tmp/hive/myuser/1fa7a82b-fe17-4795-8973-212010634cd1
[ main] o.a.hadoop.hive.ql.session.SessionState : Created local directory: C:/Users/myuser/AppData/Local/Temp/myuser/fileasdasdsa
[ main] o.a.hadoop.hive.ql.session.SessionState : Created HDFS directory: /tmp/hive/myuser/asdsadsa/_tmp_space.db
[ main] o.a.s.sql.hive.client.HiveClientImpl : Warehouse location for Hive client (version 1.2.1) is file:/C:/Users/myuser/SpringScalaAPI/spark-warehouse
[ main] o.a.s.s.e.s.s.StateStoreCoordinatorRef : Registered StateStoreCoordinator endpoint
--------------------------------------------------------------------Session was created
To be honest it isn't the first time I handle this type of error. Last time I used play framework. What are the exact steps that needed to do in this case? What variables should really be configured and what variables aren't important?
.master("local[*])
– vvg