1
votes

New to azure here. I just finished spinning up a new HDInsight instance with a new storage instance on a regular storage account. I'm wondering what my webHDFS url is/where I can retrieve it and how I can access it.

I am not using Azure Data Lake storage. (nearly every link I've found azure related leads to some data lake link)

my hdfs-site.xml:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

core-site.xml:

<property>
  <name>fs.defaultFS</name>
  <value>wasb://<my hdinsight storage name>@<my hdinsight name>.blob.core.windows.net</value>
  <final>true</final>
</property>
3

3 Answers

0
votes

Your base webHDFS FileSystem URI should be: webhdfs://<HOST>:<HTTP_PORT> where your HOST should be CLUSTERNAME.azurehdinsight.net and HTTP_PORT should be 80 by default.

The corresponding HTTP URL has the following format

http://<HOST>:<HTTP_PORT>/webhdfs/v1/

0
votes

Webhdfs port is same as HDFS namenode port, you can override this port using the below property - dfs.namenode.http-address default value is 50070.

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

When you are accessing webhdfs through curl or browser, you have to give the port number as follows.

http://<HOST>:<HTTP_PORT>/webhdfs/v1/

http://<HOST>:50070/webhdfs/v1/

0
votes

Note: Azure HDInsight does not support WebHDFS.

You do not need to create an HDInsight cluster to communicate with ADLS using WebHDFS.

  1. Azure Storage is not WebHDFS compatible.

  2. Azure Data Lake Store is a cloud-scale file system that is compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem. Your existing applications or services that use the WebHDFS API can easily integrate with ADLS.

enter image description here

Reference: WebHDFS FileSystem APIs

  1. ADLS Gen2 is Hadoop Filesystem compatible and optimized for a cloud scale big data analytics storage and is not WebHDFS compatible.