1
votes

Environment:

  • Windows 10 x64
  • Ruby 2.1.0 32 bit
  • Chef 12.12.15
  • Azure Gem 0.7.9
  • Azure-Storage Gem 0.12.1.preview

I am trying to download a ~880MB blob from a container. When I do, it throws the following error after the Ruby process hits ~500MB in size:

C:/opscode/chefdk/embedded/lib/ruby/2.1.0/net/protocol.rb:102:in `read': failed to allocate memory (NoMemoryError)

I have tried this both inside and outside of Ruby, and with both the Azure gem and the Azure-Storage gem. The result is the same with all four combinations (Azure in Chef, Azure in Ruby, Azure-Storage in Chef, Azure-Storage in Ruby).

Most of the troubleshooting I have found for these kinds of problems suggests streaming or chunking the download, but there does not appear to be a corresponding method or get_blob option to do so.

Code:

require 'azure/storage'

# vars
account_name = "myacct"
container_name = "myfiles"
access_key = "mykey"
installs_dir = "myinstalls"

# directory for files
create_dir = 'c:/' + installs_dir
Dir.mkdir(create_dir) unless File.exists?(create_dir)

# create azure client
Azure::Storage.setup(:storage_account_name => account_name, :storage_access_key => access_key)
azBlobs = Azure::Storage::Blob::BlobService.new

# get list of blobs in container
dlBlobs = azBlobs.list_blobs(container_name)

# download each blob to directory
dlBlobs.each do |dlBlob|
    puts "Downloading " + container_name + "/" + dlBlob.name
    portalBlob, blobContent = azBlobs.get_blob(container_name, dlBlob.name)
    File.open("c:/" + installs_dir + "/" + portalBlob.name, "wb") {|f|

        f.write(blobContent)
    }
end

I also tried using IO.binwrite() instead of File.open() and got the same result.

Suggestions?

3

3 Answers

1
votes

As @coderanger said, your issue was caused by using get_blob to local data into memory at once. There are two ways for resolving it.

  1. According to the offical REST reference here as below.

The maximum size for a block blob created via Put Blob is 256 MB for version 2016-05-31 and later, and 64 MB for older versions. If your blob is larger than 256 MB for version 2016-05-31 and later, or 64 MB for older versions, you must upload it as a set of blocks. For more information, see the Put Block and Put Block Listoperations. It's not necessary to also call Put Blob if you upload the blob as a set of blocks.

So for a blob which consist of block blobs, you can try to get the block blob list via list_blob_blocks to write these block blobs one by one to a local file.

  1. To generate a blob url with SAS token via signed_uri like this test code, then to download the blob via streaming to write a local file.
1
votes

The problem is that get_blob has to load the data into memory at once rather than streaming it to disk. In Chef we have the remote_file resource to help with this streaming download but you would need to get the plain URL for the blob rather than downloading it using their gem.

0
votes

I was just looking into using the azure/storage/blob library for a dev-ops project I was working on and it seems to me that the implementation is quite basic and does not utilise the full underlying API available. For example uploads are slow when streamed from a file, because most likely it's not uploading chunks in parallel etc. I don't think this library is production ready and the exposed ruby API is lacking. It's open source, so if anybody has some time, they can help to contribute.