I'm uploading many small file to S3 using Rails and Carrierwave in a background job, and am hitting S3 rate limits. My immediate thought is to put a sleep 0.1
before each upload, but that doesn't seem like a great solution.
Any suggestions on how to deal with this via the S3 API and some type of backoff?
Ruby code that is doing the upload, this method is called in a loop thousands of times:
def attach_audio(object:, audio_field:, attachment:)
return true if Rails.env.test?
language_code, voice_id = language_and_voice(object)
resp = polly.synthesize_speech(
output_format: 'mp3',
voice_id: voice_id,
text: audio_field.to_s,
language_code: language_code
)
audio_filename = "#{object.class.to_s.downcase}_#{attachment}_#{object.id}_#{voice_id}.mp3"
audio_path = "#{Rails.root}/db/audio/#{audio_filename}"
IO.copy_stream(resp.audio_stream, audio_path)
object.send(attachment + '=', Pathname.new(audio_path).open)
object.save!
end
Uploader class
class AudioUploader < BaseUploader
def store_dir
"uploads/audio/#{model.target_language}/#{self.class.to_s.underscore}/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
def extension_whitelist
%w[mp3]
end
end
class BaseUploader < CarrierWave::Uploader::Base
if Rails.env.test?
storage :file
else
storage :fog
end
def store_dir
"uploads/#{self.class.to_s.underscore}/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
end
Response from AWS
Message
Excon::Error::ServiceUnavailable: Expected(200) <=> Actual(503 Service Unavailable) excon.error.response :body => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>176C22715A856A29</RequestId><HostId>L/+
Traceback
Excon::Error::ServiceUnavailable: Expected(200) <=> Actual(503 Service Unavailable)
excon.error.response
:body => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>176C22715A856A29</RequestId><HostId>xxxxxxxxxxxxxxxxxxxxxxxxx</HostId></Error>"
:cookies => [
]
:headers => {
"Connection" => "close"
"Content-Type" => "application/xml"
"Date" => "Wed, 18 Nov 2020 07:31:29 GMT"
"Server" => "AmazonS3"
"x-amz-id-2" => "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
"x-amz-request-id" => "176C22715A856A29"
}
:host => "example-production.s3-eu-west-1.amazonaws.com"
:local_address => "xxx.xx.xxx.xxx"
:local_port => 50276
:path => "/uploads/audio/fr/audio_uploader/word/audio_file/8015423/word_audio_file_8015423_Mathieu.mp3"
:port => 443
:reason_phrase => "Slow Down"
:remote_ip => "xx.xxx.xx.x"
:status => 503
:status_line => "HTTP/1.1 503 Slow Down\r\n"
File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/middlewares/expects.rb", line 13, in response_call
File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/middlewares/response_parser.rb", line 12, in response_call
File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/connection.rb", line 448, in response
File "/app/vendor/bundle/ruby/2.6.0/gems/excon-0.71.1/lib/excon/connection.rb", line 279, in request
File "/app/vendor/bundle/ruby/2.6.0/gems/fog-xml-0.1.3/lib/fog/xml/sax_parser_connection.rb", line 35, in request
etc
EDIT
The linked AWS documentation refers to prefixes, which would seem to solve the problem
Amazon S3 automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by parallelizing reads. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.
But I don't understand how to implement it in the context of Carrierwave.