0
votes

I am trying to connect Power BI to AWS EMR Hive and retrieve the table list and data from the table.

Retrieving of table list is working fine. But when I click on a particular table I am getting the below exception in Power BI UI:

DataSource.Error: ODBC: ERROR [HY000] [Amazon][Hardy] (35) Error from server: error code: '0' error message: 'Expected states: [FINISHED], but found RUNNING'. Details:
    DataSourceKind=Odbc
    DataSourcePath=dsn=test emr aws
    OdbcErrors=[Table]

And the following error in hive log: (Error closing operation: java.nio.BufferUnderflowException)

2019-09-22T09:43:25,731 WARN  [HiveServer2-Handler-Pool: Thread-44([])]: thrift.ThriftCLIService (ThriftCLIService.java:GetResultSetMetadata(735)) - Error getting result set metadata:
org.apache.hive.service.cli.HiveSQLException: Expected states: [FINISHED], but found RUNNING
        at org.apache.hive.service.cli.operation.Operation.assertState(Operation.java:203) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.operation.GetPrimaryKeysOperation.getResultSetSchema(GetPrimaryKeysOperation.java:110) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:302) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:866) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_222]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_222]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ~[hadoop-common-2.8.5-amzn-4.jar:?]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at com.sun.proxy.$Proxy41.getResultSetMetadata(Unknown Source) ~[?:?]
        at org.apache.hive.service.cli.CLIService.getResultSetMetadata(CLIService.java:540) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.GetResultSetMetadata(ThriftCLIService.java:731) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1697) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1682) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
2019-09-22T09:43:25,820 WARN  [HiveServer2-Handler-Pool: Thread-44([])]: thrift.ThriftCLIService (ThriftCLIService.java:CloseOperation(720)) - Error closing operation:
java.nio.BufferUnderflowException
        at java.nio.Buffer.nextGetIndex(Buffer.java:506) ~[?:1.8.0_222]
        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:412) ~[?:1.8.0_222]
        at org.apache.hive.service.cli.HandleIdentifier.<init>(HandleIdentifier.java:46) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.Handle.<init>(Handle.java:38) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.OperationHandle.<init>(OperationHandle.java:41) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.OperationHandle.<init>(OperationHandle.java:37) ~[hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:717) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) [hive-service-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.3.5-amzn-0.jar:2.3.5-amzn-0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

What could be the problem? The data is retrieved fine in HUE and under 15 second.

The strange part is SQL Query in the ODBC driver works fine. Only when Tables are selected it gives the above error.

1

1 Answers

0
votes

I have a theory for this... By reading the stack trace, it seems that Hive tries to read a table ID by calling HandleIdentifier, which looks like a Long by the next call (java.nio.HeapByteBuffer.getLong). Did you try to increase Hadoop's buffer size? Default is 4KB ("io.file.buffer.size": "4096") -- try at least 8KB, which is the Java primitive size.

I got the buffer insight from here: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrconfiguration.html

And here is the official EMR app configuration guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Hope it helps!