SAP Vora 1.2 - Reading Vora tables from HANA

Question

!!! UPDATE !!!

Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration.

This is what I did:

Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle".
Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "org.apache.spark.network.yarn.YarnShuffleService".
Copy the spark--yarn-shuffle.jar file (downloaded in the step Install Spark Assembly Files and Dependent Libraries) from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib.
Restart Yarn and the node manager

!!!!!!!!!!!

I'm using SAP Vora 1.2 Developer Edition with newest Spark Controller (HANASPARKCTRL00P_5-70001262.RPM). I loaded a table into Vora in spark-shell. I can see the table in SAP HANA Studio in "spark_velocity" folder. I can load the table as Virtual Table. The problem is that I cannot select, or preview the data in the table, because of the error:

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "SPARK_testtable"."a1", "SPARK_testtable"."a2", "SPARK_testtable"."a3" FROM "spark_velocity"."testtable" "SPARK_testtable" LIMIT 200 "

Here is my hanaes-site.xml file:

<configuration>
    <!--  You can either copy the assembly jar into HDFS or to lib/external directory.
    Please maintain appropriate value here-->
    <property>
        <name>sap.hana.es.spark.yarn.jar</name>
        <value>file:///usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.server.port</name>
        <value>7860</value>
        <final>true</final>
    </property>
    <!--  Required if you are copying your files into HDFS-->
     <property>
         <name>sap.hana.es.lib.location</name>
         <value>hdfs:///sap/hana/spark/libs/thirdparty/</value>
         <final>true</final>
     </property>
     -->
    <!--Required property if using controller for DLM scenarios-->
    <!--
    <property>
        <name>sap.hana.es.warehouse.dir</name>
        <value>/sap/hana/hanaes/warehouse</value>
        <final>true</final>
    </property>
-->
    <property>
        <name>sap.hana.es.driver.host</name>
        <value>ip-10-0-0-[censored].ec2.internal</value>
        <final>true</final>
    </property>
    <!-- Change this value to vora when connecting to Vora store -->
    <property>
        <name>sap.hana.hadoop.datastore</name>
        <value>vora</value>
        <final>true</final>
    </property>

    <!-- // When running against a kerberos protected cluster, please maintain appropriate values
    <property>
        <name>spark.yarn.keytab</name>
        <value>/usr/sap/spark/controller/conf/hanaes.keytab</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.yarn.principal</name>
        <value>[email protected]</value>
        <final>true</final>
    </property>
-->
    <!-- To enable Secure Socket communication, please maintain appropriate values in the follwing section-->
    <property>
        <name>sap.hana.es.ssl.keystore</name>
        <value></value>
        <final>false</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.clientauth.required</name>
        <value>true</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.verify.hostname</name>
        <value>true</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.keystore.password</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.truststore</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.truststore.password</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.enabled</name>
        <value>false</value>
        <final>true</final>
    </property>

    <property>
        <name>spark.executor.instances</name>
        <value>10</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.executor.memory</name>
        <value>5g</value>
        <final>true</final>
    </property>
    <!-- Enable the following section if you want to enable dynamic allocation-->
    <!--
    <property>
        <name>spark.dynamicAllocation.enabled</name>
        <value>true</value>
        <final>true</final>
    </property>

    <property>
        <name>spark.dynamicAllocation.minExecutors</name>
        <value>10</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.dynamicAllocation.maxExecutors</name>
        <value>20</value>
        <final>true</final>
    </property>
    <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
    <final>true</final>
   </property>
<property>
         <name>sap.hana.ar.provider</name>
         <value>com.sap.hana.aws.extensions.AWSResolver</value>
         <final>true</final>
     </property>
<property>
        <name>spark.vora.hosts</name>
        <value>ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022</value>
        <final>true</final>
     </property>
     <property>
        <name>spark.vora.zkurls</name>
        <value>ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181</value>
        <final>true</final>
     </property>
</configuration>

ls /usr/sap/spark/controller/lib/external/

spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

hdfs dfs -ls /sap/hana/spark/libs/thirdparty

Found 4 items
-rwxrwxrwx   3 hdfs hdfs     366565 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-api-jdo-4.2.1.jar
-rwxrwxrwx   3 hdfs hdfs    2006182 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-core-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs    1863315 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-rdbms-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs     627814 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/joda-time-2.9.3.jar

ls /usr/hdp/

2.3.4.0-3485  2.3.4.7-4  current

vi /var/log/hanaes/hana_controller.log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/spark-sap-datasources-1.2.33-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 07:02:38 INFO HanaESConfig: Loaded HANA Extended Store Configuration
Found Spark Libraries. Proceeding with Current Class Path
16/05/12 07:02:39 INFO Server: Starting Spark Controller
16/05/12 07:03:11 INFO CommandRouter: Connecting to Vora Engine
16/05/12 07:03:11 INFO CommandRouter: Initialized Router
16/05/12 07:03:11 INFO CommandRouter: Server started
16/05/12 07:03:43 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729323_f17e36cf-0003-0015-452e-800c700001ee
16/05/12 07:03:48 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729329_f17e36cf-0003-0015-452e-800c700001f4
16/05/12 07:03:48 INFO VoraClientFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
16/05/12 07:03:48 INFO CBinder: searching for libpam.so.0 at /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading libpam.so.0 from /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading library libprotobuf.so
16/05/12 07:03:48 INFO CBinder: loading library libprotoc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbbmalloc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbb.so
16/05/12 07:03:48 INFO CBinder: loading library libv2runtime.so
16/05/12 07:03:48 INFO CBinder: loading library libv2net.so
16/05/12 07:03:48 INFO CBinder: loading library libv2catalog_connector.so
16/05/12 07:03:48 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:11:56 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729335_f17e36cf-0003-0015-452e-800c700001fa
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:11 INFO Utils: freeing the buffer
16/05/12 07:14:15 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/12 07:14:15 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group f17e36cf-0003-0015-452e-800c70000216
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:383)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:362)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2.applyOrElse(CommandRouter.scala:362)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Also strange is this error:

16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
    16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so

Because I have this file in the location:

ls /opt/rh/SAP/lib64/

compat-sap-c++.so

After changing com.sap.hana.aws.extensions.AWSResolver into com.sap.hana.spark.aws.extensions.AWSResolver now the log file looks different:

    16/05/17 10:04:08 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155190_7e6efa3c-0003-0015-4a91-a3b020000139
16/05/17 10:04:13 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155196_7e6efa3c-0003-0015-4a91-a3b02000013f
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO HdfsBlockRetriever: Length of HDFS file (/user/vora/test.csv): 10 bytes.
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Loading table [testtable]
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Initialized 1 loading threads. Waiting until finished... -- 0.00 s
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host mapping (Ranges: 1/1 Size: 0.00 MB)
16/05/17 10:04:29 INFO VoraJdbcClient: [secondary2.i-a5361638.cluster:2202] MultiLoad: MULTIFILE
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host finished:
    Raw ranges: 1/1
    Size:       0.00 MB
    Time:       0.29 s
    Throughput: 0.00 MB/s
16/05/17 10:04:29 INFO TableLoader: Finished 1 loading threads. -- 0.29 s
16/05/17 10:04:29 INFO TableLoader: Updated catalog -- 0.01 s
16/05/17 10:04:29 INFO TableLoader: Table load statistics:
    Name: testtable
    Size: 0.00 MB
    Hosts: 1
    Time: 0.30 s
    Cluster throughput: 0.00 MB/s
    Avg throughput per host: 0.00 MB/s
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO TableLoader: Loaded table [testtable] -- 0.37 s
16/05/17 10:04:38 INFO Utils: freeing the buffer
16/05/17 10:06:43 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/17 10:06:43 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 7e6efa3c-0003-0015-4a91-a3b02000015b
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:383)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:362)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2.applyOrElse(CommandRouter.scala:362)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I is still "not fetched by the client", but now it looks that vora loaded the table.

Anyone, some ideas how to fix it? The same error appears when I try to read Hive tables insted of Vora.

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "vora_conn_testtable"."a1", "vora_conn_testtable"."a2", "vora_conn_testtable"."a3" FROM "spark_velocity"."testtable" "vora_conn_testtable" LIMIT 200 "

Fukuhara Yohei Fukuhara Yohei · Accepted Answer · 2016-05-20T12:58:28

I've faced the same issue and solved right now! Its cause is that HANA cannot understand worker node's host names. Spark controller send HANA worker node names, which has Spark RDDs. If HANA doesn't understand their host names, HANA cannot get result and the error occurs.

Please check hosts file on HANA.

SAP Vora 1.2 - Reading Vora tables from HANA

4 Answers