How to read from Azure Blob Storage in Hadoop?

Question

I have a map-reduce job and the reducer gets an absolute address of a file residing on the Azure Blob storage and the reducer should opens it and read its content. I add the storage account containing the files when provisioning my Hadoop cluster (HDInsight). So the reducer must have access to this Blob storage but as the Blob Storage is not the default HDFS storage for my job. I have the following code in my reducer, but it gives me a FileNotFound error message.

FileSystem fs = FileSystem.get(new Configuration());
Path pt = new Path("wasb://mycontainer@accountname..."); 
FSDataInputStream stream = fs.open(pt);

Maybe you should use NativeAzureFileSystem? I can't find examples in hadoop documentation, but with respect to tests in source code it probably must be something like: NativeAzureFileSystem fs = new NativeAzureFileSystem(); fs.initialize(accountUri, conf); — Leonid Vasilev

Jonathan Gao Jonathan Gao · Accepted Answer · 2015-06-17T02:22:56

It is covered in https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/#addressing

The syntax is wasb://[email protected]/example/jars/hadoop-mapreduce-examples.jar

If "mycontainer" is a private container, you must add "myaccount" azure storage account as an additional storage account during provision process.

How to read from Azure Blob Storage in Hadoop?

1 Answers