Try running it without the sidekiq-unique-jobs
gem. It's only been protecting you against dupes for 30 minutes anyway. That gem sets its hashkeys in Redis to auto-expire after 30 minutes (configurable). sidekiq
itself sets its jobs to auto-expire in Redis after 24 hours.
I obviously don't see your app, but I'll bet you want to not process the same file very often at all. I would control this at the application layer instead and track my own hashkey doing something similar to what the unique-jobs gem is doing:
hash = Digest::MD5.hexdigest(Sidekiq.dump_json(md5_arguments))
It's also possible that the sidekiq-unique-jobs
middleware is also getting in the way of sidekiq
knowing if a job properly completed or not. I'll bet that there aren't a lot of folks testing this with long-running jobs in your same configuration.
If you continue to see this behavior without the additional middleware, give resque
a try. I've never seen this kind of behavior with that gem, and failed jobs have a helpful retry option in the admin GUI.
The main benefit of sidekiq is that it is multi-threaded. Even so, a concurrency of 25 with large video processes might be pushing it a bit. In my experience, forking is more stable and portable, with less worries about your application's thread-safety (YMMV).
Whatever you do, make sure that you are aware of the auto-expiry TTL settings that these systems put on their data in Redis. The size and nature of your jobs means that jobs could easily back up for 24 hours. These automatic deletions happen at the database layer. There are no callbacks to the application layer to warn if a job has been deleted automatically. In the sidekiq
code, for example, they introduced auto-expire behavior to "to avoid any possible leaking." ( reference ) This isn't very encouraging if you really need these jobs to execute.