I have data in azure storage blob which is in parquet format. What I need to do is to transfer all those storage files to a hdfs. Is there any way I can do that?
couldn't find any helpful method to do it,
Thanks.
using @jay's solution I was able to transfer data using following command.
command:
hadoop distcp -D fs.azure.account.key.<account name>.blob.core.windows.net=<Key> wasb://<container>@<account>.blob.core.windows.net<path to wasb file> hdfs://<hdfs path>
distcp copies directory structure recursively for more info read this link
Based on the statements in this link,actually,in Hadoop, an entire file system hierarchy is stored in a single container.
You could configure your account key and container name as below:
<property>
<name>fs.azure.account.key.youraccount.blob.core.windows.net</name>
<value>YOUR ACCESS KEY</value>
</property>
So only you need to do just copy the files into the configured container with AzCopy.
More details,please refer this document.
Update Answer:
I provide a solution here for you:
1.InstallBlobFuse on your VM to provide a virtual filesystem backed by your Azure Blob storage Container.
2.Then use cp command to copy files from container directly
to HDFS URL.
Just for summary, please use command:
hadoop distcp -D fs.azure.account.key.<account name>.blob.core.windows.net=<Key> wasb://<container>@<account>.blob.core.windows.net<path to wasb file> hdfs://<hdfs path>
distcp copies directory structure recursively for more info read this link