Although the approach to use multiple Reader
modules will work, it become wildly difficult when there are many inputs, or the number of inputs is varied.
Instead, you can use the Execute Python Script
module to directly access blob storage. Doing so, however, is exceedingly painful if you've never done it before. Here's are the issues:
- The
azure.storage.blob
Python package is not loaded by default into Azure ML. However, this can be created manually, or downloaded from the link below (correct version as of Feb 11, 2016).
- The default usage of
azure.storage.blob.BlobService
uses HTTPS, which is not currently supported in Azure ML blob storage access. For this, you can pass in protocol='http'
during the BlobService creation to force the use of HTTP: client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http")
Here are the steps to get it working:
- Download
azure.zip
which provides the required azure.storage.*
libraries: https://azuremlpackagesupport.blob.core.windows.net/python/azure.zip
- Upload them as a DataSet to the Azure ML Studio
- Connect them to the Zip input on an
Execute Python Script
module, which is the 3rd input.
- Write your script as you would normally, being sure to create your
BlobService
object with protocol='http'
- Run the Experiment - you should now be able to read and write to blob storage.
Some example code can be found here: https://gist.github.com/drdarshan/92fff2a12ad9946892df
Here is the code to make it work for a single file. This can be extended to work with numerous files by accessing a container and filtering, but that will depend on your business logic.
from azure.storage.blob import BlobService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='p8kSy3FACx...redacted...ebz3plQ=='
container_name = "upload"
blob_service = BlobService(account_name, account_key, protocol='http')
file = blob_service.get_blob_to_text(container_name,'myfile.txt')
# You can also get_blob_to_(bytes|file|path), if you need to do so.
# Do stuff with your file here
# Logic, logic, logic
# Execute Python Script requires that a dataframe is returned. It can be null.
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
For further information on limitations, why HTTP, and other notes, see Access Azure blog storage from within an Azure ML experiment