Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with RDY locking to zero #6

Closed
nickstenning opened this issue Jun 11, 2015 · 10 comments
Closed

Problem with RDY locking to zero #6

nickstenning opened this issue Jun 11, 2015 · 10 comments

Comments

@nickstenning
Copy link

Hi Trevor! First of all -- thank you so much for all your hard work on gnsq. It's a lovely little library and for the most part has been absolutely flawless for us.

Unfortunately we're having an issue at the moment with gnsq v0.2.3 (and nsqd v0.3.0) in which it seems like a network blip causes a Reader to lock itself in a ready state of zero and never recover.

We're getting the following error message in the logs:

[172.17.11.122:4150] error requeueing message (NSQSocketError(32, 'Broken pipe'))

This appears to come from line 699 of reader.py, and seems to me to imply that we're having a network blip while attempting a requeue. Thereafter, it seems like RDY never increments again. You can see the issue where we're dealing with this internally at hypothesis/h#2304.

I've tried to recreate the conditions under which this might occur using blockade (see this repo), but as yet haven't succeeded.

I've perused the source code, and I'm guessing this might be the issue you fixed in d193160 (and 6905a14). Does that sound right?

If not, is there any way I can help track this issue down?

@wtolson
Copy link
Owner

wtolson commented Jun 11, 2015

Thanks for the report Nick. This is strange as it seems gnsq thinks the connection is closed but nsqd sees the connection as open.

When gnsq sees an exception while requeueing, it should close the connection and then reopen it, resetting the RDY count. Are you seeing anything in your logs to suggest this is the case?

Thanks for hunting it down this far. I will see if I can reproduce the behaviour on my end.

@mreiferson
Copy link

FWIW as of NSQ v0.3.0 client libraries no longer need to repeatedly send RDY - see nsqio/nsq#404

@wtolson
Copy link
Owner

wtolson commented Jun 11, 2015

@mreiferson That's good news, it was always a bit tricky to get that correct. What is recommended for clients in terms of support for older versions of NSQ?

@mreiferson
Copy link

Well, one approach would be to not support older nsqd in a future release of gnsq (cleanest code). If you want to be backwards compatible, then you can pivot on the version field in the IDENTIFY response and add some logic to the RDY code paths.

@wtolson
Copy link
Owner

wtolson commented Jun 11, 2015

Small bit of progress made. I'm able to consistently reproduce the behaviour you're seeing with this snippet:

import logging
import gnsq


logging.basicConfig(level=logging.DEBUG)
reader = gnsq.Reader('test', 'test', nsqd_tcp_addresses=['localhost:4150'])


@reader.on_message.connect
def handle_message(reader, message):
    for conn in reader.conns:
        conn.stream.socket.send('badcmd\n')
    raise Exception('test')


reader.start()

It seems there is some unexpected behaviour when the socket is closed on the server's side. gnsq reconnects, but an initial RDY command is never sent.

@wtolson
Copy link
Owner

wtolson commented Jun 14, 2015

Seems to have been a bug with how gnsq handles connections failures while starting to backoff. I've pushed a new version of gnsq to pypi (version 0.3.0) with a fix for this included. Let me know if this resolves the issue for you. Thanks again for reporting!

@nickstenning
Copy link
Author

Thank you so much for this! I won't be able to tell you immediately if this resolves our issue as it manifests itself in production pretty rarely. I will deploy 0.3.0 and see how we get on.

nickstenning added a commit to hypothesis/h that referenced this issue Jun 15, 2015
This hopefully addresses the issues we've seen with RDY locking to zero
(issue #2304).

  wtolson/gnsq#6
@wtolson
Copy link
Owner

wtolson commented Jun 15, 2015

Thanks, I'm going to close the issue. Feel free to reopen if the issue occurs again.

@wtolson wtolson closed this as completed Jun 15, 2015
@nickstenning
Copy link
Author

Just a note to say that we've seen the networking issues in production again, and no evidence that gnsq is getting confused any more. Thank you!

@wtolson
Copy link
Owner

wtolson commented Jun 18, 2015

Sweet, thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants