I'm deploying an Azure data factory by deploying an ARM template using Visual Studio, basically following this Azure tutorial exactly, step by step.
The template defines a data factory, with an Azure Storage linked service (for reading and writing source and output data), an input dataset and an output data set, an HDInsight on-demand linked service, and a pipeline which runs an HDInsight HIVE activity to run a HIVE script which processes the input datasets into an output data set.
Everything deploys successfully and the pipeine activity starts. However I get the following error from the activity:
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:445) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:619) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I have found various posts such as this one and this one suggesting the porblem is a known bug caused by a dash or hyphen in the HIVE metastore database name.
My problem is that using an ARM template to deploy an HDInsigh cluster on demand, I have no access to the cluster itself, so I can't make any manual config changes (the idea of on-demand is that it is transient, only created to serve a set of demands and then deletes itself).
The issue can be reproduced easily simply by following the tutorial step by step.
The only possible glimmer of hope I have found is by setting the hcatalogLinkedServiceName as documented here, which is designed to allow you to use your own Azure SQL database as the hive metastore. However, this doesn't work either - if I use that property, I get:
‘JamesTestTutorialARMDataFactory/HDInsightOnDemandLinkedService’ failed with message ‘HCatalog integration is not enabled for this subscription.’
My subscription is unrestricted, and should have all the features of Azure available. So now I'm completely stuck. It seems that currently, using Hive with on-demand HDInsight is basically impossible?
If anyone can think of anything to try, I'm all ears!
Thanks