"Queued" jobs not processed after worker connects #926

olalonde · 2016-08-03T21:09:04Z

I've got a few jobs stuck in "Queue". The worker won't seem to process them unless I click the refresh icon. Wonder why this is and if it's possibly a bug with Kue.

victusfate · 2016-08-19T16:18:10Z

I had this for a backlog of 1000+ jobs, I ended up writing a script to clear them all out and restarted the server.

Not sure what caused it to happen but there were a number of jobs created simultaneously (in error) and each of them is long running.

sample code to start cleaning stuff up in redis (please disregard the sp0n stuff it's from private repo):

'use strict';

const config        = require('sp0n-config');
const logglyToken   = config.Loggly.Token;
const subdomain     = config.Loggly.Subdomain;
const logInitialize = require('sp0n-logger').init;
const kue           = require('kue');
const url           = require('url');
const redis         = require('redis');
const util          = require('sp0n-util');

var getRedis = () => {
  // if need to clean up locally process.env.LOCAL_REDIS_URL
  var redisUrl = url.parse(config.Redis.RedisUrl)
    , client = redis.createClient(redisUrl.port, redisUrl.hostname);
  // console.log({ action: 'app', redisUrl: redisUrl });
  if (redisUrl.auth) {
    client.auth(redisUrl.auth.split(":")[1]);
  }
  return client;
}

var isInt = (val) => {
  return typeof val === 'number' && Math.floor(val) === val;
}

var queue = kue.createQueue({
  redis: { 
    createClientFactory: getRedis
  }
});

queue.inactive( function( err, ids ) { // others are active, complete, failed, delayed
  const sAction = 'queue.inactive';
  console.log({ action: sAction, ids: ids });
  for (let id in ids) {
    kue.Job.remove( ids[id], (err) => {
      if (err) {
        console.error({ action: sAction + '.err', id: ids[id], err:err });
      }
    });
  }
});

// another way to go
// queue.Job.rangeByState( 'inactive', 0, 10000, 'asc', function( err, jobs ) {
//   for (let job in jobs) {
//     kue.Job.remove( jobs[job].id, (err) => {
//       if (err) {
//         console.error({ action: sAction + '.err', id: ids[id], err:err });
//       }
//     });
//   }
// });

olalonde · 2016-08-19T23:18:10Z

Thanks for the script. I added queue.watchStuckJobs(5 * 1000) and problem hasn't re-occurred so far.

victusfate · 2016-08-25T19:06:19Z

We ran into again. It appears to occur whenever a large number of jobs are queued in a short time interval. I'll try to focus on a test case locally and determine where its getting stuck.

@behrad mentioned some changes with the next release, and they may have some effect so I'll test latest if I can reproduce, and see if that resolves the stuck in queued state issue.

olalonde · 2016-08-25T19:11:49Z

Yeah, reproducing the bug would be helpful. Additionally, something super useful would be a CLI vs the web dashboard or the redis REPL. kue list kue retry failed kue clear failed etc. Might work on this when I have time.

victusfate · 2016-08-25T19:12:40Z

I didn't even think redis repl, cool.

victusfate · 2016-08-30T17:19:24Z

Reviewing tips for preventing stuck queues:
https:/Automattic/kue#prevent-from-stuck-active-jobs

This could definitely be the cause as I'm doing media processing and all kinds of interesting errors can arise. I'll go with the domain wrapper or promise setup (all the rest of the code I'm using is promises).

hmm I'd already been doing something like this

    queue.process(this.type, this.concurrency, (job, done) => {
      this.fWorker(job, job.data)
      .then( (data) => {
        done();
      })
      .catch( (err) => {
        done(err);
      })
    });

read some comments that domains are deprecated->
https://nodejs.org/api/domain.html

victusfate · 2016-09-08T15:12:16Z

some good stuff mentioned in this thread as well (similar issue)
#130

I'm trying something in the workers now to gracefully shutdown. I was reliably getting the queue stuck by queueing jobs and killing the worker. It would never have a chance to call done. Also if a worker crashes the same can happen (hence the domain grabs or maybe a try catch)

victusfate · 2016-09-08T16:15:26Z

ahoy olalonde -> this was an earlier batch job I was able to run and kill and get consistently stuck jobs. Now with some modifications it doesn't get stuck, but I'm seeing some active jobs just hanging out in limbo. I commented on this in issue #130

Ok put together a gist with graceful queue and worker shutdown. I'm still seeing a stuck active job, so I think worker pause is not triggering active jobs into an inactive state.

Here's the gist:
https://gist.github.com/victusfate/1e2ce9eb73de32b78d2690d660f0f9c8

victusfate · 2016-09-08T17:31:57Z

Updated the gist to handle setting active jobs to inactive

Ok, believe my latest version of that gist works as expected, pauses worker and makes any incomplete jobs inactive so other workers or future workers can pick them up.

olalonde · 2016-09-09T20:30:19Z

@victusfate good job 👍 What happens if the process signal handlers are not called? do the inactive jobs get unstuck eventually?

victusfate · 2016-09-15T14:25:44Z

Yeah I didn't handle uncaught exceptions, and there could be other signals I missed but it worked very well while I killed and restarted it testing locally. No stuck queue - in the earlier version I could reliably recreate a stuck queue just be killing the workers and rerunning them.

So the above sample code is like some level of battle hardening but not break proof. Still it resolved all the stuck queue issues I've seen on our dev/prod environments.

aiavci · 2020-01-01T16:59:52Z

I had this for a backlog of 1000+ jobs, I ended up writing a script to clear them all out and restarted the server.

Not sure what caused it to happen but there were a number of jobs created simultaneously (in error) and each of them is long running.

sample code to start cleaning stuff up in redis (please disregard the sp0n stuff it's from private repo):

'use strict';

const config        = require('sp0n-config');
const logglyToken   = config.Loggly.Token;
const subdomain     = config.Loggly.Subdomain;
const logInitialize = require('sp0n-logger').init;
const kue           = require('kue');
const url           = require('url');
const redis         = require('redis');
const util          = require('sp0n-util');

var getRedis = () => {
  // if need to clean up locally process.env.LOCAL_REDIS_URL
  var redisUrl = url.parse(config.Redis.RedisUrl)
    , client = redis.createClient(redisUrl.port, redisUrl.hostname);
  // console.log({ action: 'app', redisUrl: redisUrl });
  if (redisUrl.auth) {
    client.auth(redisUrl.auth.split(":")[1]);
  }
  return client;
}

var isInt = (val) => {
  return typeof val === 'number' && Math.floor(val) === val;
}

var queue = kue.createQueue({
  redis: { 
    createClientFactory: getRedis
  }
});

queue.inactive( function( err, ids ) { // others are active, complete, failed, delayed
  const sAction = 'queue.inactive';
  console.log({ action: sAction, ids: ids });
  for (let id in ids) {
    kue.Job.remove( ids[id], (err) => {
      if (err) {
        console.error({ action: sAction + '.err', id: ids[id], err:err });
      }
    });
  }
});

// another way to go
// queue.Job.rangeByState( 'inactive', 0, 10000, 'asc', function( err, jobs ) {
//   for (let job in jobs) {
//     kue.Job.remove( jobs[job].id, (err) => {
//       if (err) {
//         console.error({ action: sAction + '.err', id: ids[id], err:err });
//       }
//     });
//   }
// });

Does this mean jobs that fail are removed and never run?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Queued" jobs not processed after worker connects #926

"Queued" jobs not processed after worker connects #926

olalonde commented Aug 3, 2016 •

edited

Loading

victusfate commented Aug 19, 2016 •

edited

Loading

olalonde commented Aug 19, 2016

victusfate commented Aug 25, 2016

olalonde commented Aug 25, 2016 •

edited

Loading

victusfate commented Aug 25, 2016

victusfate commented Aug 30, 2016 •

edited

Loading

victusfate commented Sep 8, 2016

victusfate commented Sep 8, 2016 •

edited

Loading

victusfate commented Sep 8, 2016 •

edited

Loading

olalonde commented Sep 9, 2016 •

edited

Loading

victusfate commented Sep 15, 2016 •

edited

Loading

aiavci commented Jan 1, 2020

"Queued" jobs not processed after worker connects #926

"Queued" jobs not processed after worker connects #926

Comments

olalonde commented Aug 3, 2016 • edited Loading

victusfate commented Aug 19, 2016 • edited Loading

olalonde commented Aug 19, 2016

victusfate commented Aug 25, 2016

olalonde commented Aug 25, 2016 • edited Loading

victusfate commented Aug 25, 2016

victusfate commented Aug 30, 2016 • edited Loading

victusfate commented Sep 8, 2016

victusfate commented Sep 8, 2016 • edited Loading

victusfate commented Sep 8, 2016 • edited Loading

olalonde commented Sep 9, 2016 • edited Loading

victusfate commented Sep 15, 2016 • edited Loading

aiavci commented Jan 1, 2020

olalonde commented Aug 3, 2016 •

edited

Loading

victusfate commented Aug 19, 2016 •

edited

Loading

olalonde commented Aug 25, 2016 •

edited

Loading

victusfate commented Aug 30, 2016 •

edited

Loading

victusfate commented Sep 8, 2016 •

edited

Loading

victusfate commented Sep 8, 2016 •

edited

Loading

olalonde commented Sep 9, 2016 •

edited

Loading

victusfate commented Sep 15, 2016 •

edited

Loading