0
votes

How can I upload a folder recursively to azure blob storage? I would want to upload a parquet file:

abcd.parquet
├── _SUCCESS
├── myPart=20180101
│   └── part-00179-660f71d6-ed44-41c7-acf0-008724dd923a.c000.gz.parquet
├── myPart=20180102
   └── part-00022-660f71d6-ed44-41c7-acf0-008724dd923a.c000.gz.parquet

The following:

az storage blob upload -f abcd.parquet -c my_container -n abcd

fails with: Is a directory

It looks like recursive upload is available on windows using AZCopy https://stephanefrechette.com/upload-multiple-files-recursively-azure-blob-storage-azure-cli-2-0-macoslinux/#.W3JpGVJCSL4 https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy

It looks like: something similar is available for linux https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux but I also wonder if I should use spark instead.

Also, is it possible to convert the directory hierarchy on upload into the filename i.e. abcd.parquet_dt=2018..._part-....gz.parquet so that less directory listings are required?

In the end the partitioning should still work as intended for spark after upload to azure.

related to: - Uploading 10,000,000 files to Azure blob storage from Linux

1

1 Answers

0
votes

blobxfer https://github.com/Azure/blobxfer is great to sync the files to azure (recursively)