4
votes

I'm uploading many small file to S3 using Rails and Carrierwave in a background job, and am hitting S3 rate limits. My immediate thought is to put a sleep 0.1 before each upload, but that doesn't seem like a great solution.

Any suggestions on how to deal with this via the S3 API and some type of backoff?

Ruby code that is doing the upload, this method is called in a loop thousands of times:

    def attach_audio(object:, audio_field:, attachment:)
      return true if Rails.env.test?

      language_code, voice_id = language_and_voice(object)

      resp = polly.synthesize_speech(
        output_format: 'mp3',
        voice_id: voice_id,
        text: audio_field.to_s,
        language_code: language_code
      )

      audio_filename = "#{object.class.to_s.downcase}_#{attachment}_#{object.id}_#{voice_id}.mp3"
      audio_path = "#{Rails.root}/db/audio/#{audio_filename}"
      IO.copy_stream(resp.audio_stream, audio_path)

      object.send(attachment + '=', Pathname.new(audio_path).open)
      object.save!
    end

Uploader class

class AudioUploader < BaseUploader

  def store_dir
    "uploads/audio/#{model.target_language}/#{self.class.to_s.underscore}/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end

  def extension_whitelist
    %w[mp3]
  end
end
class BaseUploader < CarrierWave::Uploader::Base
  if Rails.env.test?
    storage :file
  else
    storage :fog
  end

  def store_dir
    "uploads/#{self.class.to_s.underscore}/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end
end

Response from AWS

Message

Excon::Error::ServiceUnavailable: Expected(200) <=> Actual(503 Service Unavailable) excon.error.response :body => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>176C22715A856A29</RequestId><HostId>L/+

Traceback

Excon::Error::ServiceUnavailable: Expected(200) <=> Actual(503 Service Unavailable)
excon.error.response
  :body          => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>176C22715A856A29</RequestId><HostId>xxxxxxxxxxxxxxxxxxxxxxxxx</HostId></Error>"
  :cookies       => [
  ]
  :headers       => {
    "Connection"       => "close"
    "Content-Type"     => "application/xml"
    "Date"             => "Wed, 18 Nov 2020 07:31:29 GMT"
    "Server"           => "AmazonS3"
    "x-amz-id-2"       => "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    "x-amz-request-id" => "176C22715A856A29"
  }
  :host          => "example-production.s3-eu-west-1.amazonaws.com"
  :local_address => "xxx.xx.xxx.xxx"
  :local_port    => 50276
  :path          => "/uploads/audio/fr/audio_uploader/word/audio_file/8015423/word_audio_file_8015423_Mathieu.mp3"
  :port          => 443
  :reason_phrase => "Slow Down"
  :remote_ip     => "xx.xxx.xx.x"
  :status        => 503
  :status_line   => "HTTP/1.1 503 Slow Down\r\n"

  File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/middlewares/expects.rb", line 13, in response_call
  File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/middlewares/response_parser.rb", line 12, in response_call
  File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/connection.rb", line 448, in response
  File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/connection.rb", line 279, in request
  File "/app/vendor/bundle/ruby/2.6.0/gems/fog-xml-0.1.3/lib/fog/xml/sax_parser_connection.rb", line 35, in request

etc

EDIT

The linked AWS documentation refers to prefixes, which would seem to solve the problem

Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by parallelizing reads. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.

But I don't understand how to implement it in the context of Carrierwave.

4
There seems to be some solution in the form of prefixes, but I don't really understand the article docs.aws.amazon.com/AmazonS3/latest/dev/…shutdown_r_now
So how many uploads per second you think you have? Do you use Sidekiq or ActiveJob?Hubert Jakubiak
If you read aws documentation they suggest using an exponential backoff if you start getting slow down response docs.aws.amazon.com/general/latest/gr/api-retries.html. I'm not too familiar with these libraries but I reckon you could rescue the error, parse the body for the AWS error code and then wait for (2^retry_count) seconds before trying againNick Hyland
Could s3fs be the answer? Instead of uploading you would be writing to a s3fs mounted partition with the end result being the same. You can see my answer to a somewhat similar problem here stackoverflow.com/questions/60311166/ffmpeg-pipe-segments-to-s3/…Dan M
Like Hubert asked: How many parallel uploads to you have running? I've used S3 for years and I'ver never hit their limits ...Clemens Kofler

4 Answers

3
votes

From here

For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.

you learn what your limits are. Now you need to understand what a prefix is and that's easy. Consider this:

/uploads/audio/fr/audio_uploader/word/audio_file/8015423/word_audio_file_8015423_Mathieu.mp3

What is the prefix here? Answer:

/uploads/audio/fr/audio_uploader/word/audio_file/8015423

The prefix is everything except the object name. So the answer to your problem resides in your hability to design a scheme so that you never exceed the limits defined by Amazon for each prefix.

You could for example use a revolving counter, let's say from 0 to 99, and store somewhere the relationship between the object being saved and the revolving counter spot where it was stored [so that you can read it later]. If you were to implement this, your problem would be reduced to 1/100th of what it is right now; you may actually not need to go all the way to 100 and you could always increase it in the future if you needed to. So now, this:

/uploads/audio/fr/audio_uploader/word/audio_file/8015423/word_audio_file_8015423_Mathieu.mp3

would be stored in:

/uploads/audio/fr/audio_uploader/word/audio_file/00/8015423/word_audio_file_8015423_Mathieu.mp3

and the next one in .../01/... so on and so forth, with the 100th object stored in .../99/... and then the 101st object stored back in .../00/... [you wouldn't have to use the two characters obviously].

The extra step this process brings to your logic is that for retrieval purposes you need to know that word_audio_file_8015423_Mathieu.mp3 is in .../00/... and, for example, word_audio_file_8015424_Mark.mp3 is in .../01/... and so on. This means you would have to store the relationship between the object and the spot where it was saved. On the other hand, you may not even need to do that if it's acceptable to search all the spots looking for the object you want.

I feel strongly this would take care of your problem.

1
votes

If you use Sidekiq without ActiveJob you could use sidekiq-throttled gem and threshold option to slow down your uploads in background jobs.

Example:

class UploadWorker
  include Sidekiq::Worker
  include Sidekiq::Throttled::Worker

  sidekiq_options :queue => :uploads
  sidekiq_throttle({
    # Allow maximum 1K jobs being processed within one second window.
    :threshold => { :limit => 1_000, :period => 1.second }
  })

  def perform
    # do your thing
  end
end
0
votes

According to AWS docs prefix aka key prefix is similar to a directory name that enables you to store similar data under the same directory in a bucket. You need to find your way how you could group your uploads. In your case it could be making additional directory from object.id value as a name.

0
votes

I tried using https://github.com/nickelser/activejob-traffic_control, but couldn't get the job to work properly.

In the end I found a super-simple solution that worked: I moved the audio creation and storage in S3 for each word into a new ActiveJob class. Then just called it 1000s of times, and it is automatically throttled by the Sidekiq concurrency settings.

config/sidekiq.yml

---
:concurrency: 10
:max_retries: 3
:queues:
  - [urgent, 4]
  - [nlp, 3]
  - [default, 2]
  - [low]