7
votes

We are planning to deploy our azure web application to two separate data centers (one located in West Europe and the other located in Southeast Asia) for purely performance reasons. We allow users to upload files which means we need to keep the blob storage of the two data centers in sync. I know Azure provides support for synchronizing structured data but there seems to be no such support for blob synchronization. My questions is:

Is there a service that provides blob synchronization between different data centers? if not, how can I implement one? I see many samples on the web to sync between Azure blob storage and local file system and vice versa but not between data centers.

6
@danludwig: I have looked at geo-replication but my understanding is that it is purely for durability i.e. provides automatic fail over for storage. Also, with geo-replication the secondary data center is automatically chosen based on your primary data center so it doesn't suit my requirement.Suresh Kumar
have you seen this already. cloudberrylab.com/cloud-migrator.aspx . looks like it would suit your requirement.Aravind
@Aravind, looks interesting! I've signed up for the Beta. I have the same problem with Amazon S3 and was surprised to see how few tools were available to do this job. I would have though it would be a common backup scenario.QFDev
@QF_Developer okay.some are happy with failover strategy and not all would want such an option. if more customers and ISVs ask for such feature am sure cloud providers would come up with this option.Aravind

6 Answers

3
votes

Is there a service that provides blob synchronization between different data centers?

No. Currently no such service exists out of the box which would synchronize content between 2 data centers.

if not, how can I implement one?

Although all the necessary infrastructure is available for you to implement this, the actual implementation would be tricky.

First you would need to decide if you want real-time synchronization or will a batched synchronization would do?

For realtime synhroniztion you could rely on Async Copy Blob. Using async copy blob you can actually instruct the storage service to copy blob from one storage account to another instead of manually download the blob from source and uploading to target. Assuming all uploads are happening from your application, as soon as a blob is uploaded you would know in which datacenter it is being uploaded. What you could do is create a SAS URL of this blob and initiate an async copy to the other datacenter.

For batched synchronization, you would need to query both storage accounts and list blobs in each blob container. In case the blob is available in just one storage account and not other, then you could simply create the blob in destination storage account by initiating async copy blob. Things would become trickier if the blob (by the same name) is present in both storage accounts. In this case you would need to define some rules (like comparing modified date etc.) to decide whether the blob should be copied from source to destination storage account.

For scheduling the batch synchronization, you could make use of Windows Azure Scheduler Service. Even with this service, you would need to write code for synchronization logic. Scheduler service will only take care of scheduling part. It won't do the actual synchronization.

I would recommend making use of a worker role to implement synchronization logic. Another alternative is Web Jobs which are announced recently though I don't know much about it.

3
votes

If your goals are just about performance and the content is public use Azure CDN for this. Point it at your primary blob storage container and it will copy the files around the world for best performance.

1
votes

I know this is an old query and much would have changed in the recent past. I ended up this link while searching for the similar task . So thought will update the latest from AzCopy v10. It has an sync option ;

Synchronizes file systems to Azure Blob storage or vice versa. Use azcopy sync . Ideal for incremental copy scenarios.

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10

0
votes

You can automate this task with powershell:

Download all Blobs (with Snapshots) from One Windows Azure Storage Account http://gallery.technet.microsoft.com/scriptcenter/all-Blobs-with-Snapshots-3b184a79

Using PowerShell to move files to Azure Storage http://www.scarydba.com/2013/06/03/using-powershell-to-move-files-to-azure-storage/

Copy all VHDs in Blob Storage from One Windows Azure Subscription to Another http://gallery.technet.microsoft.com/scriptcenter/Copy-all-VHDs-in-Blog-829f316e

0
votes

Old question I know, but the Windows.Azure.Storage.DataMovement library is good for this.

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-data-movement-library

0
votes

Using Bash with Azure CLI and AZCopy - Code is on Github and associated video on YouTube to get it working.

https://github.com/J0hnniemac/yt-blobsync

#!/bin/bash
cd /home
app_id=""
tenant=""
sourceurl="https://<>.blob.core.windows.net"
destinationurl="https://<>.blob.core.windows.net"

pemfile="/home/service-principal.pem"

sourceaccount=$(echo $sourceurl | awk -F/ '{print $3}' | awk -F. '{print $1}')
destinationaccount=$(echo $destinationurl | awk -F/ '{print $3}' | awk -F. '{print $1}')

echo $app_id
echo $tenant
echo $sourceurl
echo $destinationurl

echo $sourceaccount
echo $destinationaccount

az login --service-principal --password $pemfile --username $app_id --tenant $tenant

# list storage containers
az storage container list --auth-mode login --account-name $sourceaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > src.txt
az storage container list --auth-mode login --account-name $destinationaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > dst.txt

grep -vf dst.txt src.txt > diff.txt 

for blob_container in $(cat diff.txt);
        do
        echo $blob_container;
        newcmd="az storage container create --auth-mode login --account-name $destinationaccount -n $blob_container --fail-on-exist" 
        echo "---------------------------------"
        echo $newcmd
        eval $newcmd
done

echo "performing AZCOPY login"
azcopy login --service-principal --certificate-path $pemfile --application-id $app_id --tenant-id $tenant



echo "performing AZCOPY sync for each container"
for blob_container in $(cat src.txt);
   do
    #Create timestame + 30 Minutes for SAS token
    end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
    sourcesas=`az storage container generate-sas --account-name $sourceaccount --as-user --auth-mode login --name $blob_container --expiry $end --permissions acdlrw`
    echo $sourcesas
    # remove leading and trailing quotes from SAS Token
    sourcesas=$(eval echo $sourcesas)
    echo $sourcesas
    src="$sourceurl/$blob_container?$sourcesas"
    dst="$destinationurl/$blob_container"
    echo $src
    echo $dst
    synccmd="azcopy sync \"$src\" \"$dst\" --recursive --delete-destination=true"
    echo $synccmd
    eval $synccmd
done