-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Queued" jobs not processed after worker connects #926
Comments
I had this for a backlog of 1000+ jobs, I ended up writing a script to clear them all out and restarted the server. Not sure what caused it to happen but there were a number of jobs created simultaneously (in error) and each of them is long running. sample code to start cleaning stuff up in redis (please disregard the sp0n stuff it's from private repo):
|
Thanks for the script. I added |
We ran into again. It appears to occur whenever a large number of jobs are queued in a short time interval. I'll try to focus on a test case locally and determine where its getting stuck. @behrad mentioned some changes with the next release, and they may have some effect so I'll test latest if I can reproduce, and see if that resolves the stuck in queued state issue. |
Yeah, reproducing the bug would be helpful. Additionally, something super useful would be a CLI vs the web dashboard or the redis REPL. |
I didn't even think redis repl, cool. |
Reviewing tips for preventing stuck queues: This could definitely be the cause as I'm doing media processing and all kinds of interesting errors can arise. I'll go with the domain wrapper or promise setup (all the rest of the code I'm using is promises). hmm I'd already been doing something like this
read some comments that domains are deprecated-> |
some good stuff mentioned in this thread as well (similar issue) I'm trying something in the workers now to gracefully shutdown. I was reliably getting the queue stuck by queueing jobs and killing the worker. It would never have a chance to call done. Also if a worker crashes the same can happen (hence the domain grabs or maybe a try catch) |
ahoy olalonde -> this was an earlier batch job I was able to run and kill and get consistently stuck jobs. Now with some modifications it doesn't get stuck, but I'm seeing some active jobs just hanging out in limbo. I commented on this in issue #130 Ok put together a gist with graceful queue and worker shutdown. I'm still seeing a stuck active job, so I think worker pause is not triggering active jobs into an inactive state. Here's the gist: |
Updated the gist to handle setting active jobs to inactive Ok, believe my latest version of that gist works as expected, pauses worker and makes any incomplete jobs inactive so other workers or future workers can pick them up. |
@victusfate good job 👍 What happens if the process signal handlers are not called? do the inactive jobs get unstuck eventually? |
Yeah I didn't handle uncaught exceptions, and there could be other signals I missed but it worked very well while I killed and restarted it testing locally. No stuck queue - in the earlier version I could reliably recreate a stuck queue just be killing the workers and rerunning them. So the above sample code is like some level of battle hardening but not break proof. Still it resolved all the stuck queue issues I've seen on our dev/prod environments. |
Does this mean jobs that fail are removed and never run? |
I've got a few jobs stuck in "Queue". The worker won't seem to process them unless I click the refresh icon. Wonder why this is and if it's possibly a bug with Kue.
The text was updated successfully, but these errors were encountered: