Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Queued" jobs not processed after worker connects #926

Open
olalonde opened this issue Aug 3, 2016 · 12 comments
Open

"Queued" jobs not processed after worker connects #926

olalonde opened this issue Aug 3, 2016 · 12 comments

Comments

@olalonde
Copy link
Contributor

olalonde commented Aug 3, 2016

screenshot 2016-08-03 14 07 05

I've got a few jobs stuck in "Queue". The worker won't seem to process them unless I click the refresh icon. Wonder why this is and if it's possibly a bug with Kue.

@victusfate
Copy link

victusfate commented Aug 19, 2016

I had this for a backlog of 1000+ jobs, I ended up writing a script to clear them all out and restarted the server.

Not sure what caused it to happen but there were a number of jobs created simultaneously (in error) and each of them is long running.

sample code to start cleaning stuff up in redis (please disregard the sp0n stuff it's from private repo):

'use strict';

const config        = require('sp0n-config');
const logglyToken   = config.Loggly.Token;
const subdomain     = config.Loggly.Subdomain;
const logInitialize = require('sp0n-logger').init;
const kue           = require('kue');
const url           = require('url');
const redis         = require('redis');
const util          = require('sp0n-util');

var getRedis = () => {
  // if need to clean up locally process.env.LOCAL_REDIS_URL
  var redisUrl = url.parse(config.Redis.RedisUrl)
    , client = redis.createClient(redisUrl.port, redisUrl.hostname);
  // console.log({ action: 'app', redisUrl: redisUrl });
  if (redisUrl.auth) {
    client.auth(redisUrl.auth.split(":")[1]);
  }
  return client;
}

var isInt = (val) => {
  return typeof val === 'number' && Math.floor(val) === val;
}

var queue = kue.createQueue({
  redis: { 
    createClientFactory: getRedis
  }
});

queue.inactive( function( err, ids ) { // others are active, complete, failed, delayed
  const sAction = 'queue.inactive';
  console.log({ action: sAction, ids: ids });
  for (let id in ids) {
    kue.Job.remove( ids[id], (err) => {
      if (err) {
        console.error({ action: sAction + '.err', id: ids[id], err:err });
      }
    });
  }
});

// another way to go
// queue.Job.rangeByState( 'inactive', 0, 10000, 'asc', function( err, jobs ) {
//   for (let job in jobs) {
//     kue.Job.remove( jobs[job].id, (err) => {
//       if (err) {
//         console.error({ action: sAction + '.err', id: ids[id], err:err });
//       }
//     });
//   }
// });

@olalonde
Copy link
Contributor Author

Thanks for the script. I added queue.watchStuckJobs(5 * 1000) and problem hasn't re-occurred so far.

@victusfate
Copy link

We ran into again. It appears to occur whenever a large number of jobs are queued in a short time interval. I'll try to focus on a test case locally and determine where its getting stuck.

@behrad mentioned some changes with the next release, and they may have some effect so I'll test latest if I can reproduce, and see if that resolves the stuck in queued state issue.

@olalonde
Copy link
Contributor Author

olalonde commented Aug 25, 2016

Yeah, reproducing the bug would be helpful. Additionally, something super useful would be a CLI vs the web dashboard or the redis REPL. kue list kue retry failed kue clear failed etc. Might work on this when I have time.

@victusfate
Copy link

I didn't even think redis repl, cool.

@victusfate
Copy link

victusfate commented Aug 30, 2016

Reviewing tips for preventing stuck queues:
https:/Automattic/kue#prevent-from-stuck-active-jobs

This could definitely be the cause as I'm doing media processing and all kinds of interesting errors can arise. I'll go with the domain wrapper or promise setup (all the rest of the code I'm using is promises).

hmm I'd already been doing something like this

    queue.process(this.type, this.concurrency, (job, done) => {
      this.fWorker(job, job.data)
      .then( (data) => {
        done();
      })
      .catch( (err) => {
        done(err);
      })
    });

read some comments that domains are deprecated->
https://nodejs.org/api/domain.html

@victusfate
Copy link

some good stuff mentioned in this thread as well (similar issue)
#130

I'm trying something in the workers now to gracefully shutdown. I was reliably getting the queue stuck by queueing jobs and killing the worker. It would never have a chance to call done. Also if a worker crashes the same can happen (hence the domain grabs or maybe a try catch)

@victusfate
Copy link

victusfate commented Sep 8, 2016

ahoy olalonde -> this was an earlier batch job I was able to run and kill and get consistently stuck jobs. Now with some modifications it doesn't get stuck, but I'm seeing some active jobs just hanging out in limbo. I commented on this in issue #130

Ok put together a gist with graceful queue and worker shutdown. I'm still seeing a stuck active job, so I think worker pause is not triggering active jobs into an inactive state.

Here's the gist:
https://gist.github.com/victusfate/1e2ce9eb73de32b78d2690d660f0f9c8

@victusfate
Copy link

victusfate commented Sep 8, 2016

Updated the gist to handle setting active jobs to inactive

Ok, believe my latest version of that gist works as expected, pauses worker and makes any incomplete jobs inactive so other workers or future workers can pick them up.

@olalonde
Copy link
Contributor Author

olalonde commented Sep 9, 2016

@victusfate good job 👍 What happens if the process signal handlers are not called? do the inactive jobs get unstuck eventually?

@victusfate
Copy link

victusfate commented Sep 15, 2016

Yeah I didn't handle uncaught exceptions, and there could be other signals I missed but it worked very well while I killed and restarted it testing locally. No stuck queue - in the earlier version I could reliably recreate a stuck queue just be killing the workers and rerunning them.

So the above sample code is like some level of battle hardening but not break proof. Still it resolved all the stuck queue issues I've seen on our dev/prod environments.

@aiavci
Copy link

aiavci commented Jan 1, 2020

I had this for a backlog of 1000+ jobs, I ended up writing a script to clear them all out and restarted the server.

Not sure what caused it to happen but there were a number of jobs created simultaneously (in error) and each of them is long running.

sample code to start cleaning stuff up in redis (please disregard the sp0n stuff it's from private repo):

'use strict';

const config        = require('sp0n-config');
const logglyToken   = config.Loggly.Token;
const subdomain     = config.Loggly.Subdomain;
const logInitialize = require('sp0n-logger').init;
const kue           = require('kue');
const url           = require('url');
const redis         = require('redis');
const util          = require('sp0n-util');

var getRedis = () => {
  // if need to clean up locally process.env.LOCAL_REDIS_URL
  var redisUrl = url.parse(config.Redis.RedisUrl)
    , client = redis.createClient(redisUrl.port, redisUrl.hostname);
  // console.log({ action: 'app', redisUrl: redisUrl });
  if (redisUrl.auth) {
    client.auth(redisUrl.auth.split(":")[1]);
  }
  return client;
}

var isInt = (val) => {
  return typeof val === 'number' && Math.floor(val) === val;
}

var queue = kue.createQueue({
  redis: { 
    createClientFactory: getRedis
  }
});

queue.inactive( function( err, ids ) { // others are active, complete, failed, delayed
  const sAction = 'queue.inactive';
  console.log({ action: sAction, ids: ids });
  for (let id in ids) {
    kue.Job.remove( ids[id], (err) => {
      if (err) {
        console.error({ action: sAction + '.err', id: ids[id], err:err });
      }
    });
  }
});

// another way to go
// queue.Job.rangeByState( 'inactive', 0, 10000, 'asc', function( err, jobs ) {
//   for (let job in jobs) {
//     kue.Job.remove( jobs[job].id, (err) => {
//       if (err) {
//         console.error({ action: sAction + '.err', id: ids[id], err:err });
//       }
//     });
//   }
// });

Does this mean jobs that fail are removed and never run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants