-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze queue options for long running SYNC GET #2198
Labels
Comments
This was referenced May 4, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Mingle Card: 2441
Description
We have a few options, we could pick a specialized priority job queue library (such as Kue or monq), we can build a job queue abstraction over a message queue library/platform such a Amazon SQS, RabbitMQ, ZeroMQ etc), or we can refine our own queue to handle multiple consumers (the queue use for processing sync POST).
Acceptance Criteria
The queue mechanism we chose to work with needs:
Analysis
Specialized library
Kue:
Kue is a priority job queue backed by redis (rather than our existing server-side data storage platform, mongodb).
Pros:
Cons:
monq:
Monq is a MongoDB-backed job queue.
Pros:
Cons:
refine our own queue implementation
To cover the acceptance criteria, we'd need to make a few changes to our queue to correctly handle multiple consumers.
We can handle the atomic requirement by using mongodbs findAndModify command in conjunction with adding a new "inProg" boolean to the message envelope. Since findAndModify obtains a write-lock on the affected database (blocking other operations until it has completed), we can safely track in-progress jobs and prevent multiple consumers picking up the same job.
To support findAndModify, we'd have to remove our current usage of tailable cursors and replace with polling as these features cannot be used together at the moment see: https://jira.mongodb.org/browse/SERVER-11753
If/when the heroku process is unexpectedly terminated while a message is being processed, we can attempt to clean up (listening in for SIGTERM, then setting inprog to false - or marking the packet with error: shutdown), however this isn't foolproof. In the case of unexpected termination, a job may be left in a stale state (in prog = true) with no corresponding worker.
As part of the findAndModify call, we can set a timestamp so that we know when the message was pulled down. Based on how long we expect consumers to take per message, we can query documents which are marked as in progress but the timestamp was some time ago.
Using this technique, we can update those documents to reset the "inProg" flag, effectively returning them back to the queue for another consumer to work on, or we can mark the document with an error.
The text was updated successfully, but these errors were encountered: