1
votes

I have multiple JSON format files which is being pushed to the Azure storage account under a specific container. There are n number of files in the container.

And 4 to 8 nodes which will be accessing the Azure storage container to downloaded the files locally, the download code is written in java.

Since there are n number of files and multiple file accessing the container at the same time, how to avoid the situation that the same file is downloaded by another server?

Example:
 Azure container has 1.json, 2.json, 3.json, etc which are > 35 MB size.
 batch-process-node1 -> starts downloading 1.json
 batch-process-node2 -> starts downloading 2.json
 batch-process-node3 -> should not start downloading the 1.json 

Is there any logic to be built for each node which has the java process to download the file uniquely? Is there any setting that can be set in the Azure storage container?

--

Trying to use the Camel Azure-bolb component, using the block blob (blobType).

New to Azure storage blob, any help is appreciated.

1
Do you want to avoid download file by sever nodes at the same time or you want to the file be download only by single node?Joey Cai
If any file that is picked for downloading by one server node, the same file shouldn't be available for other server node(s) for download. If this helps.Tim
Currently exploring the option of using the camel azure-blob component, to download the blob in an distributed environment.Tim

1 Answers

0
votes

Since we are already using Apache camel in the code, we tried to use camel azure-blob component to address the issue. Below is the approach we used, still the race condition is acceptable for our scenario. Camel route started with timer consumer, and producer to get the list of blob from container using below endpoint,

azure-blob://<account>/<container>?credentials=#storagecredentials&amp;blobType=blockBlob&amp;operation=listBlobs

Note: storagecredential is a bean of type StorageCredentialsAccountAndKey class.

Created a java class implementing the Processor of camel, and in process() method, using the exchange.getIn().getBody() => which provides an iterable object with has ListBlobItem.

first i set the meta data of the blob using below endpoint

azure-blob://<account>/<container>/*<blobName>*?credentials=#storagecredentials&blobType=blockBlob&operation=updateBlockBlob&blobMetadata=#blobMetaData1

Note: blobMetaData1 is bean created in the context file.

 <util:map id="blobMetaData1" map-class="java.util.HashMap">
        <entry key="someKey" value="someValue"/>
 </util:map>

Key thing: In this class process method

  1. validate the metadata is being set or not, if set then the process is already picked the blob. so it won't be picked again assuming if the process executed in different server.
  2. got the blob name from the ListBlobItem individual blob item. using getURI() and forming the endpoint within this processor class. in order to invoke the custom endpoint, used to set it an customer header value of In message.

using the recipientList camel option which invokes the metadata endpoint to update the specific blob.

Then used another processor to form the download blob endpoint

  azure-blob://<account>/<container>/*<blobName>*?credentials=#storagecredentials&blobType=blockBlob&operation=getBlob  

and using the recipientList to get the processor endpoint from message header.

finally forming another delete endpoint which will delete once its downloaded.