

Handle sidekiq processing when one job saturates your workers and the rest queue...
source link: https://blog.arkency.com/2017/07/sidekiq-slow-processing-one-job-saturates-workers-rest-queue-up/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Handle sidekiq processing when one job saturates your workers and the rest queue up
I saw a great question on reddit which I am gonna quote and try to provide a few possible answers.
Ran in to a scenario for a second or 3rd time today and I’m stumped as how to handle it.
We run a ton of stuff as background workers, pretty standard stuff, broken up in to a few priority queues.
Every now and then one of our jobs fails and starts running for a long time - usually for reasons outside of our control - our connection to S3 drops or as it happened today - our API connection to our mail system was timing out.
So jobs that normally run in a second or two are now taking 60 seconds and holding a worker for that time. Enough of those jobs quickly saturate our available workers and no other work gets done. The 60 second timeout hits for those in-process jobs, they get shuffled to the retry queue, a few smaller jobs process through the available workers until the queued jobs pull in enough of the failing jobs to again saturate the available workers.
I’d think this would be a pattern that other systems would have and there would be a semi-obvious solution for it - I’ve come up empty handed. My thought was to separate the workers by queue and balance those on different worker jobs but then that still runs the risk of saturating a specific queue’s workers.
Here are your options:
Lower your timeouts
Keep monitoring averages and percentiles of how long it takes to finish a certain job in your system (using chillout or any other metric collector). This will give you a better insight into how long is normal for this task to take and what timeout you should set.
Prefer using configurable, lower-level network timeouts provided directly by libraries over
Timeout
module.Pause a queue.
Keep the troublesome job on a separate queue. Use Sidekiq Pro. When lots of jobs are failing or taking too long, just pause the queue. Great feature. Saved our ass a few times.
Partition your queues into many machines or processes.
Have machine one work on queues A,B,C,D and machine two work on queues E,F,G,H.
Use Circuit Breaker pattern.
Circuit breaker is used to detect failures and encapsulates logic of preventing a failure to reoccur constantly
Keep your queues in two reverse orders
I am not sure if that’s possible with Sidekiq but it was possible with Resque. Most of our machines were processing jobs in normal priority: A,B,C,D,E,F,G. But there was one machine configured to process them in reverse: G,F,E,D,C,B,A.
That way if job D started being problematic then A-C was covered by most machines and G-E was covered by the other machine. Because even if jobs in last queue are least important in your system, you generally don’t want them to be starved but rather keep processing them albeit more slowly.
Increase number of threads per worker.
If most of your tasks are IO bound (usually on networking) then you might increase number of threads processing them as your CPU is probably not utilized fully.
Let me know if you have other ways to handle such situation.
Recommend
-
48
The default interface of sidekiq allows you to see the number of processed and failed jobs with a morgue which has all the dead jobs. The interface is sufficient when you...
-
35
A small thing that brings huge help. The other day I was writing some code to process a very large amount of items coming from a social media API. My items were ending in a queue in MySQL and then needed to be pr...
-
9
Processing Large Datasets On AWS Using Ruby, Rails and SideKiq Jan 3, 2017 Two days ago I did a data processing task which previously took me a week – overnight. I did this using the following technology stack:
-
18
add post processing queue · torvalds/linux@cabf08e · GitHubPermalink
-
18
If you're running your queue workers on a server with limited resources, or a server that's also used to serve HTTP requests and do other tasks, it's important to ration the resource used by those workers.Workers Memory Consumption
-
11
Sidekiq Simple, efficient background processing for Ruby. Sidekiq uses threads to handle many jobs at the same time in the same process. It does not require Rails but will integrate tightly with Rails to make background proce...
-
7
How to handle exceptions when processing a collection? advertisements Let's say you have a function (X) that t...
-
7
Commit 0ab98612 authored 7 months ago by Melissa Wen
-
11
How to Handle Exceptions When Processing DynamoDB Stream Events in .NET Lambda FunctionDynamoDB Streams capture a time-ordered sequence of events in any DynamoDB table.In a previous blog post,
-
8
Thursday, 24 August 2023 09:23 TIO opens new queue to handle ‘vulnerable’ customers By Kenn Antho...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK